Member of Technical Staff - Reliability Engineering
Define and implement reliability systems for a growing AI cloud infrastructure platform, including architectural improvements, operational processes, monitoring, and incident response. Requires 5+ years production coding and 2+ years on-call experience with strong cloud skills.