Skip to content

Staff Site Reliability Engineer

218k – 260kMountain View, CADevOps / SREOnsite10+ YOE
Summary

Leads infrastructure transformation from monoliths to scalable microservices at massive scale, architects observability/CI/CD systems, unifies complex stacks, and mentors engineers. Requires 10+ years coding internal tools, 5+ years cloud (GCP/AWS), Bachelor's in CS.

About the role

Responsibilities

  • Execute on the transformation from monolith to scalable microservices (API/Platform focus).
  • Drive initiatives to continually improve reliability, with a deep understanding of the implications of each “9.”
  • Architect systems and write code that enables application teams to adopt best practices by default—not by instruction.
  • Integrate and unify diverse infrastructure components into a cohesive, scalable platform within a massive tech stack.
  • Design observability, reliability, and CI/CD frameworks to support growth and operational excellence at scale.
  • Collaborate cross-functionally with product, application, and integration teams to align infrastructure direction with business goals.
  • Provide technical leadership to shift the team from reactive support to a proactive, strategic function.
  • Mentor and guide a team of 6 engineers while shaping the direction of infrastructure engineering.

Minimum Qualifications

  • Bachelor's degree in Computer Science or related field of study.
  • At least 10 years of hands-on coding experience in building internal platforms/tools to support developer experience and operational best practices.
  • At least 5 years of experience in cloud platforms—GCP preferred, AWS acceptable; cloud engineering background required.

Preferred Qualifications

  • Proven experience scaling infrastructure in environments with many thousands of nodes.
  • Track record of leading architectural shifts from monolithic systems to microservices in large-scale environments.
  • Deep knowledge of reliability engineering and high-availability systems; able to articulate the impact of increasing the number of 9s.
  • Strong understanding of first-party infrastructure integration and unifying disparate systems.
  • Familiarity with observability, CI/CD tooling, and infrastructure automation.
  • Experience at large-scale tech companies (Google, Meta, Amazon, etc.) or equivalent environments highly preferred.
  • Strong cross-functional collaboration skills and the ability to drive infrastructure alignment across engineering orgs.
Skills
GCPAWSKubernetesCI/CDObservabilityMicroservicesReliability EngineeringInfrastructure AutomationCloud EngineeringPlatform Engineering
Similar roles at this salary range
All DevOps / SRE jobs →
Plaid

Staff Site Reliability Engineer, Release Engineering

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO
Fivetran

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP
Dropbox

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++
Okta

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Cribl

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3