Skip to content

AI Agent Infrastructure Lead

Leads development of internal AI agent infrastructure ("Goose") to boost velocity across engineering, ops, and other teams. Builds safe, autonomous agent workflows for codebase inspection, testing, and complex tasks with strong focus on safety and accuracy.

230k – 260kSan Francisco, CADevOps / SREOnsite

About the role

What You'll Do

Within days

  • Ship AI agent features that help Sphere's engineering team use agents more effectively and responsibly.
  • Iterate on versions of Sphere's agent sandbox environments.
  • Create workflows where agents can inspect the codebase, run local infrastructure, make changes, run tests, and prepare work for human review.
  • Work directly with engineers to identify high-leverage internal workflows where agents can create immediate velocity.

Within months

  • Lead Sphere's internal efforts to enable AI agents to act more autonomously across engineering, ops, customer success, tax research, and implementation.
  • Build "Goose" for Sphere: the internal AI agent layer that helps agents understand and operate across Sphere's systems.
  • Own the infrastructure, tooling, and workflows that let agents safely take on more complex internal work over time.
  • Establish the patterns for how Sphere uses agents internally, including context, permissions, review, observability, and escalation.

Requirements

  • Experience building production-quality software.
  • Experience in AI agents, coding agents, internal developer tooling, or AI agent enablement.
  • Comfort working across backend systems, infrastructure, local development environments, CI, and internal tools.
  • Strong judgment around autonomy, safety, permissions, and human review.
  • High agency. You can take a vague internal problem and turn it into a working system people actually use.
  • Strong attention to detail. Agents are only useful here if they improve speed without reducing correctness.

Skills

AI AgentsCoding AgentsBackend SystemsInfrastructureCI/CDInternal Developer ToolingSandbox EnvironmentsObservabilityPermissions ManagementLocal Development Environments

Similar roles

DevOps / SRE jobs

Software Engineer

Design, build, and operate large-scale infrastructure services and automation tooling. Requires 4 years of experience with distributed systems, Kubernetes, IaC, CI/CD, and cloud infrastructure.

230k – 270kSan Francisco, CADevOps / SREHybrid4+ YOEAWSGCP

Software Engineer, Productivity - Inference Runtime

Builds and improves CI/CD, testing, validation, and release tooling for OpenAI's inference runtime teams to ensure reliable, performant model deployments across ChatGPT, API, and research workloads. Requires strong Python skills, developer productivity experience, and high ownership in ambiguous environments.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++GPU

Software Engineer, Core Network Engineering

Builds and operates high-performance networking infrastructure for OpenAI's large-scale AI training and inference, focusing on host networking, datacenter fabrics, and WAN systems. Optimizes latency, reliability, and scalability using technologies like RDMA, InfiniBand, and RoCE; requires strong systems programming in C++, Python, or Go.

230k – 342kSan Francisco, CADevOps / SREOn-siteGoC++

Software Engineer, Productivity - Model Performance

Builds and improves developer tools, CI/CD pipelines, and testing workflows to boost productivity for OpenAI's model performance engineering teams. Requires strong Python skills, experience with developer infrastructure, and ability to work in ambiguous environments.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++Rust

Software Engineer, Productivity - Networking

Enhances developer productivity for OpenAI's networking team by improving build systems, CI/CD pipelines, test harnesses, and workflows for C++ and Python codebases in multi-server environments. Requires experience with developer tools and infrastructure automation.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++CI/CD