Skip to content

Site Reliability Engineering

Site Reliability Engineer owns the lifecycle of services powering autonomous vehicles, designing fault-tolerant systems, building monitoring tools, leading incident response, and ensuring infrastructure resilience with large-scale data processing on CPUs/GPUs. Requires 5+ years SRE experience, cloud/IaC expertise, Kubernetes, and strong programming skills.

140k – 230kFoster City, CADevOps / SREHybrid5+ YOE

About the role

Responsibilities

  • Architect and optimize scalable systems to design, implement, and improve highly reliable infrastructure.
  • Build proactive monitoring solutions including advanced monitoring, alerting, and reporting tools.
  • Collaborate across engineering teams to elevate system architecture, streamline deployments, and drive automation.
  • Lead incident resolution through root cause analyses and deploy corrective actions.
  • Ensure business continuity by designing and implementing disaster recovery plans.

Qualifications

Required:

  • 5+ years of experience in site reliability engineering or similar, managing large-scale distributed systems.
  • Proven experience with major cloud platforms (AWS, GCP, Azure) and IaC tools (Terraform, Ansible, Salt, CloudFormation).
  • Technical expertise in container orchestration technologies such as Kubernetes.
  • Deep understanding of networking protocols, storage solutions, and database technologies.
  • Strong programming and scripting skills in Python, Go, C/C++, or Java.

Bonus:

  • Experience in automotive or autonomous vehicle industry.
  • Knowledge of security best practices and compliance requirements.

Skills

KubernetesTerraformAnsibleAWSGCPAzurePythonGoC/C++Java

Similar roles

DevOps / SRE jobs

Software Engineer, Compute Infrastructure

Build and operate Kubernetes-based compute and runtime infrastructure powering AI search, assistant, and agent workloads across multi-cloud environments. Own reliability, scalability, cost-efficiency, and on-call for production platform services.

140k – 220kMountain View, CADevOps / SREHybrid5+ YOEGCPAWS

Release Engineer

As a Release Engineer, you will orchestrate software releases for autonomous vehicle technology, ensuring secure and streamlined delivery from development to production. This role involves managing simulation tools and autonomy software releases, coordinating vehicle-level testing, and scaling automation systems.

140k – 190kFoster City, CADevOps / SREHybrid3+ YOEGitAWS

Infrastructure Engineer, Foundation

Infrastructure Engineer on the Foundation team builds and maintains highly available systems and developer tooling to ensure platform stability and productivity for processing mortgage transactions. Requires deep curiosity, full ownership from design to maintenance, and ability to solve hard problems under pressure.

140k – 220kPalo Alto, CA +1DevOps / SREHybridAWSGraphQL

DevOps Engineer, DevEx

Builds and evolves internal developer platforms using Kubernetes, Terraform, and GitOps to enhance reliability, scalability, and DevEx. Requires 5+ years in platform engineering, strong AWS and cloud-native expertise, with on-call responsibilities.

140k – 170kNew York, NY +1DevOps / SRERemote5+ YOEAWSHelm

Software Engineer, Developer Productivity

Designs and optimizes build systems, CI/CD pipelines, and developer tooling in a Bazel monorepo. Enables AI-powered productivity tools like GitHub Copilot to boost engineering velocity and reduce workflow friction.

140k – 265kPalo Alto, CA +1DevOps / SREHybridGoJava