Staff Software Engineer, Inference

320k – 485kSan Francisco, CANew York, NYSeattle, WAHybrid7+ YOEJun 8

Summary

Build and maintain distributed inference systems serving Claude to millions of users. Design intelligent routing, autoscaling, and high-performance infrastructure across diverse AI accelerators.

About the role

Key Responsibilities

Design, build, and maintain the distributed systems that serve Claude to millions of users worldwide
Develop intelligent request routing, load balancing, and traffic management systems across thousands of accelerators
Maximize compute efficiency across the fleet by autoscaling and orchestrating production, research, and experimental workloads
Build and operate production-grade deployment pipelines for releasing new models to users
Provide high-performance inference infrastructure that enables researchers to develop next-generation models
Integrate new AI accelerator platforms and support inference for new model architectures
Use observability data to tune and improve performance based on real-world production workloads

Representative Projects

Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators
Autoscaling compute fleet to dynamically match supply with demand across production, research, and experimental workloads
Building production-grade deployment pipelines for releasing new models to millions of users
Integrating new AI accelerator platforms to maintain hardware-agnostic competitive advantage
Contributing to new inference features (e.g., structured sampling, prompt caching)
Supporting inference for new model architectures
Analyzing observability data to tune performance based on real-world production workloads
Managing multi-region deployments and geographic routing for global customers

Minimum Qualifications

Significant software engineering experience, particularly with distributed systems
Results-oriented, with a bias towards flexibility and impact
Willingness to pick up slack, even if it goes outside your job description
Enjoy pair programming
Desire to learn more about machine learning systems and infrastructure
Thrive in environments where technical excellence directly drives both business results and research breakthroughs
Care about the societal impacts of your work

Preferred Qualifications

Experience with high-performance, large-scale distributed systems
Experience implementing and deploying machine learning systems at scale
Experience with load balancing, request routing, or traffic management systems
Familiarity with LLM inference optimization, batching, and caching strategies
Experience with Kubernetes and cloud infrastructure (AWS, GCP, Azure)
Proficiency in Python or Rust

Logistics

Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

Skills

PythonRustKubernetesAWSGCPAzureDistributed SystemsLoad BalancingRequest RoutingLLM Inference Optimization

Similar roles at this salary range

All ML Engineering jobs →

OpenAI

Jun 5

Researcher: Agent Post-Training, API & Power-Users

Improve agentic model capabilities for API and power users by designing experiments, building evals from real workflows, and driving post-training interventions from discovery through launch.

295k – 445kSan Francisco, CAML EngineeringHybridRLLLMs

Nuance Labs

Jun 5

Member of Technical Staff — RL Research

Own RL and post-training infrastructure for omni foundation models. Build and scale rollout, reward, and policy systems from 0→1 for real-time audiovisual AI.

300k – 400kSeattle, WAML EngineeringOn-siteRLPPO

Datadog

Jun 5

Staff Applied Scientist - Dashboards

Staff Applied Scientist defining evaluation strategy and quality metrics for Datadog's AI-native Dashboards product. Owns ML/GenAI evaluation systems, builds datasets and harnesses, and drives improvements in retrieval, tool selection, and agent performance.

276k – 345kNew York, NYML EngineeringHybridGenerative AITool Selection

Nuance Labs

Jun 5

Member of Technical Staff — Pretraining Infra

Own and scale the distributed training infrastructure for large-scale omni model pretraining across GPU clusters, covering job orchestration, parallelism, GPU communication, data loading, and performance optimization.

300k – 400kSeattle, WAML EngineeringOn-siteNCCLMegatron

Square

Jun 4

Staff Applied Machine Learning Engineer - Fraud & Abuse

Staff Applied ML Engineer building and operating production ML decision systems to detect and prevent payment fraud, scams, identity abuse, and marketplace risk across Block.

277k – 415kSan Francisco, CAML EngineeringOn-siteSQLJava

Apply