Software Reliability Engineer
Build and operate resilient systems for Nuro's autonomous vehicle fleet. Design pipelines, automation, and tools to improve reliability and reduce operational toil. Join on-call rotation and lead investigations.
About the Work
- Build fleet-scale pipelines that turn noisy onboard signals into actionable, high-confidence investigations.
- Develop automated triage and correlation systems that deduplicate issues, route them to the right owning teams, and attach up-to-date priority signals and diagnostic context.
- Partner with engineering teams and subject matter experts to turn investigation outcomes into better instrumentation, automation, and signal quality over time.
- Build internal tools and workflows that reduce duplicate effort and increase situational awareness as the fleet scales (self-service debugging, standardized metrics, shared templates, securely scoped access).
- Lead reliability investigations to identify contributing factors and ensure learnings turn into durable engineering changes.
About You
- Experience writing and shipping software with an ownership mindset and attention to how it behaves in real-world conditions.
- Ability to build and maintain tools and automation (Python, Go, Bash, C++).
- Comfortable navigating systems remotely via SSH + CLI, and inspecting the state of a linux system and its services.
- Interest in reliability engineering as a growth path: motivated to learn how to build distributed systems and the challenges of scaling them reliably.
This is a 12 month temporary full-time position with full benefits and potential for extension based on performance and business needs.
Site Reliability Engineer - AI Agents
Design, build, and operate reliable infrastructure for AI agent workflows and model serving on AWS and Kubernetes. Build platform APIs, SDKs, and self-service tooling while ensuring observability and incident response for production AI systems.
Baremetal Infrastructure Engineer
Deploy and support Nominal's self-hosted platform in customer environments including air-gapped and regulated sites. Own Linux, Kubernetes, and bare-metal infrastructure reliability while partnering directly with customer IT and security teams.
Senior Site Reliability Engineer
Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.