Production Engineer, IaaS
Own observability, API surface, and control plane for a hyperscale AI compute fleet. Build production-grade data pipelines, stateful APIs, and Kubernetes infrastructure that other teams depend on.
Owns and builds core parts of the Forge platform end-to-end, including IDE, compiler, runtime, and infra for AI-powered English programming language. Requires staff+ engineer experience building products 0-1 with deep technical craft.
Own observability, API surface, and control plane for a hyperscale AI compute fleet. Build production-grade data pipelines, stateful APIs, and Kubernetes infrastructure that other teams depend on.
Own end-to-end health, repair automation, and qualification of a hyperscale GPU/TPU compute fleet. Build metrics pipelines, firmware tooling, and self-healing repair workflows across Kubernetes and bare metal.
Builds and scales core infrastructure including ML training/serving, Kubernetes clusters, and low-latency voice/audio pipelines. Requires 3+ years in infrastructure/ML systems, hands-on reliability engineering, and Kubernetes expertise.
Builds and operates reliable, scalable AI infrastructure including observability, SLOs, incident response, automation, and performance tuning for ultra-low-latency serverless compute. Requires 3+ years SRE/DevOps experience with cloud, Kubernetes, programming (Go/Rust/Python), and observability tools.