Skip to content

Software Reliability Engineer

Builds and operates resilient systems for autonomous vehicle fleet reliability, including pipelines for signal analysis, automated triage tools, internal workflows, and leading investigations. Requires production software experience and strong debugging skills in Python, Go, Bash, C++.

146k – 219kMountain View, CADevOps / SREOnsite

About the role

Responsibilities

  • Build fleet-scale pipelines that turn noisy onboard signals into actionable, high-confidence investigations.
  • Develop automated triage and correlation systems that deduplicate issues, route them to the right owning teams, and attach up-to-date priority signals and diagnostic context.
  • Partner with engineering teams and subject matter experts to turn investigation outcomes into better instrumentation, automation, and signal quality over time.
  • Build internal tools and workflows that reduce duplicate effort and increase situational awareness as the fleet scales (self-service debugging, standardized metrics, shared templates, securely scoped access).
  • Lead reliability investigations to identify contributing factors and ensure learnings turn into durable engineering changes.

Requirements

  • Experience writing and shipping software that runs in production, with an ownership mindset and attention to how it behaves in real-world conditions.
  • Ability to build and maintain tools and automation that enable other engineers: internal tools, instrumentation, and visualizations (Python, Go, Bash, C++).
  • Strong debugging fundamentals across the stack, including using system signals and live troubleshooting to form hypotheses and identify contributing factors.
  • Strong interest in reliability engineering as a growth path: motivated by making complex systems understandable, resilient, and easier to run as they scale.

Nice-to-Haves

  • Background in distributed systems or real-world deployed systems (vehicles, robotics, IoT, or similar).
  • Familiarity with production telemetry and observability.
  • Experience applying reliability metrics and operational feedback loops to drive improvements.
  • Exposure to cross-team reliability work in mission-critical environments.

Compensation

Base pay range: $145,830 - $219,000 (depending on experience, qualifications, education, location, skills). Eligible for annual performance bonus, equity, and competitive benefits package.

Skills

PythonGoBashC++Distributed SystemsObservabilityTelemetryReliability EngineeringDebuggingAutomation

Similar roles

DevOps / SRE jobs

DevOps Engineer

DevOps Engineer builds and maintains scalable cloud infrastructure, CI/CD pipelines, and containerized deployments using Kubernetes, Docker, and Terraform across AWS, Azure, and GCP. Collaborates with engineering teams to enhance reliability, observability, and developer productivity in a hybrid environment.

145k – 175kSan Diego, CADevOps / SREHybrid3+ YOEAWSGCP

People Systems Developer

Builds and deploys AI-native workflows for HR systems like talent acquisition, onboarding, and performance management. Integrates LLMs and APIs into tools like Workday and Greenhouse; requires 3-5 years software engineering with AI/automation experience.

145k – 215kBoston, MA +2DevOps / SREHybrid3+ YOEn8nLLMs

DevOps Engineer

Builds and maintains cloud infrastructure using Terraform and AWS, manages Kubernetes clusters with ArgoCD and Helm, automates deployments with Ansible, and improves CI/CD pipelines and security for scalable systems. Requires 3-5 years DevOps experience.

145k – 174kLos Angeles, CADevOps / SREOn-site3+ YOEAWSHelm

Finance & GTM Systems Administrator

Manages and optimizes Finance & GTM systems like NetSuite and Salesforce, handling configurations, integrations, workflows, and AI initiatives. Partners with stakeholders to deliver scalable solutions for Finance and Revenue teams. Requires 5+ years experience with 3+ in NetSuite and Salesforce admin.

145k – 170kRedwood City, CADevOps / SRERemote5+ YOECPQBrex

Software Engineer, Production Engineering

Software Engineer in Production Engineering ensures reliability, scalability, and performance of Figma's services. Drives infrastructure initiatives, debugs production issues, and builds operational tools. Requires 5+ years experience with large-scale systems and cloud infrastructure.

149k – 350kSan Francisco, CA +1DevOps / SRERemote5+ YOEAWSGCP