Staff Site Reliability Engineer
218k – 260kMountain View, CADevOps / SREOnsite10+ YOE
Summary
Leads infrastructure transformation from monoliths to scalable microservices at massive scale, architects observability/CI/CD systems, unifies complex stacks, and mentors engineers. Requires 10+ years coding internal tools, 5+ years cloud (GCP/AWS), Bachelor's in CS.
About the role
Responsibilities
- Execute on the transformation from monolith to scalable microservices (API/Platform focus).
- Drive initiatives to continually improve reliability, with a deep understanding of the implications of each “9.”
- Architect systems and write code that enables application teams to adopt best practices by default—not by instruction.
- Integrate and unify diverse infrastructure components into a cohesive, scalable platform within a massive tech stack.
- Design observability, reliability, and CI/CD frameworks to support growth and operational excellence at scale.
- Collaborate cross-functionally with product, application, and integration teams to align infrastructure direction with business goals.
- Provide technical leadership to shift the team from reactive support to a proactive, strategic function.
- Mentor and guide a team of 6 engineers while shaping the direction of infrastructure engineering.
Minimum Qualifications
- Bachelor's degree in Computer Science or related field of study.
- At least 10 years of hands-on coding experience in building internal platforms/tools to support developer experience and operational best practices.
- At least 5 years of experience in cloud platforms—GCP preferred, AWS acceptable; cloud engineering background required.
Preferred Qualifications
- Proven experience scaling infrastructure in environments with many thousands of nodes.
- Track record of leading architectural shifts from monolithic systems to microservices in large-scale environments.
- Deep knowledge of reliability engineering and high-availability systems; able to articulate the impact of increasing the number of 9s.
- Strong understanding of first-party infrastructure integration and unifying disparate systems.
- Familiarity with observability, CI/CD tooling, and infrastructure automation.
- Experience at large-scale tech companies (Google, Meta, Amazon, etc.) or equivalent environments highly preferred.
- Strong cross-functional collaboration skills and the ability to drive infrastructure alignment across engineering orgs.
Skills
GCPAWSKubernetesCI/CDObservabilityMicroservicesReliability EngineeringInfrastructure AutomationCloud EngineeringPlatform Engineering
Similar roles at this salary range
All DevOps / SRE jobs →Staff Site Reliability Engineer, Release Engineering
Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.
208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO
Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE