Skip to content

Software Engineer, Agent Infrastructure

230k – 385kSan Francisco, CANew York, NYHybrid
Summary

Builds and scales infrastructure for training and deploying AI agents, including novel container orchestration beyond Kubernetes, FastAPI/gRPC APIs, and Terraform-based systems. Collaborates with researchers on high-scale ML environments and production platforms for OpenAI products.

About the role

Responsibilities

  • Push massive compute clusters to their limits as a core contributor to a novel in-house container orchestration platform that scales beyond Kubernetes.
  • Develop and maintain FastAPI and gRPC APIs serving as the interface for agentic infrastructure in training and production.
  • Use Terraform to stand up and evolve complex infrastructure for research and production.
  • Collaborate with research teams to stand up and optimize systems for novel AI training runs and experimental applications.

Requirements

  • Deep experience working on large-scale machine learning infrastructure, reasoning about training at scale, identifying bottlenecks, and engineering optimization solutions.
  • Ability to build new things from 0-1 quickly and scale them 1,000,000x.
  • Keen eye for performance and optimization in complex, globally-distributed systems.
  • Experience with cloud platforms and infrastructure-as-code like Terraform.
  • Driven by solving complex, ambiguous problems at the intersection of infrastructure scalability, virtualization efficiency, and agentic capabilities.
  • Deep technical expertise in virtualization and containerization technologies (e.g. Kata, Firecracker, gVisor, Sysbox) and passion for optimizing runtime performance.
Skills
KubernetesFastAPIgRPCTerraformKataFirecrackergVisorSysboxcontainer orchestrationmachine learning infrastructure
Similar roles at this salary range
All Backend Engineering jobs →
Pinterest

Staff Software Engineer, Growth AI

Staff Software Engineer anchoring AI-powered growth products across SEO and exploratory teams. Architect production ML systems, partner with ML orgs, and set technical direction as a senior IC.

208k – 365kSan Francisco, CA +3Backend EngineeringHybridJavaLLMs
Traba

Staff Software Engineer

Lead development of core backend systems and platform architecture for an AI-powered industrial supply chain startup. Own architectural decisions, CI/CD, and performance optimization in an early-stage team.

240k – 300kNew York, NY +1Backend EngineeringOn-siteKafkaPython
ClickUp

Staff Backend Engineer, Search

Staff-level search engineer responsible for designing, scaling, and optimizing ClickUp's search infrastructure using OpenSearch/ElasticSearch, including real-time indexing, vector search, and relevance tuning.

250k – 300kUnited StatesBackend EngineeringRemoteNLPIndexing
ClickUp

Senior Backend Engineer, Search

Senior Search Engineer responsible for designing, optimizing, and scaling search infrastructure using OpenSearch/ElasticSearch, improving relevance and speed, and building vector search capabilities.

200k – 250kUnited StatesBackend EngineeringRemoteNLPIndexing
GlossGenius

Staff Software Engineer, Backend

Staff Backend Engineer leading architecture and technical direction for AI-powered products. Owns system design, mentors engineers, and builds proof-of-concepts in Kotlin on AWS/Kubernetes.

241k – 284kNew York, NYBackend EngineeringHybridAWSLLMs