Skip to content

Staff Software Engineer, Inference

320k – 485kSan Francisco, CANew York, NYSeattle, WAHybrid7+ YOE
Summary

Build and maintain distributed inference systems serving Claude to millions of users. Design intelligent routing, autoscaling, and high-performance infrastructure across diverse AI accelerators.

About the role

Key Responsibilities

  • Design, build, and maintain the distributed systems that serve Claude to millions of users worldwide
  • Develop intelligent request routing, load balancing, and traffic management systems across thousands of accelerators
  • Maximize compute efficiency across the fleet by autoscaling and orchestrating production, research, and experimental workloads
  • Build and operate production-grade deployment pipelines for releasing new models to users
  • Provide high-performance inference infrastructure that enables researchers to develop next-generation models
  • Integrate new AI accelerator platforms and support inference for new model architectures
  • Use observability data to tune and improve performance based on real-world production workloads

Representative Projects

  • Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators
  • Autoscaling compute fleet to dynamically match supply with demand across production, research, and experimental workloads
  • Building production-grade deployment pipelines for releasing new models to millions of users
  • Integrating new AI accelerator platforms to maintain hardware-agnostic competitive advantage
  • Contributing to new inference features (e.g., structured sampling, prompt caching)
  • Supporting inference for new model architectures
  • Analyzing observability data to tune performance based on real-world production workloads
  • Managing multi-region deployments and geographic routing for global customers

Minimum Qualifications

  • Significant software engineering experience, particularly with distributed systems
  • Results-oriented, with a bias towards flexibility and impact
  • Willingness to pick up slack, even if it goes outside your job description
  • Enjoy pair programming
  • Desire to learn more about machine learning systems and infrastructure
  • Thrive in environments where technical excellence directly drives both business results and research breakthroughs
  • Care about the societal impacts of your work

Preferred Qualifications

  • Experience with high-performance, large-scale distributed systems
  • Experience implementing and deploying machine learning systems at scale
  • Experience with load balancing, request routing, or traffic management systems
  • Familiarity with LLM inference optimization, batching, and caching strategies
  • Experience with Kubernetes and cloud infrastructure (AWS, GCP, Azure)
  • Proficiency in Python or Rust

Logistics

  • Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
  • Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
  • Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
  • Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.
Skills
PythonRustKubernetesAWSGCPAzureDistributed SystemsLoad BalancingRequest RoutingLLM Inference Optimization
Similar roles at this salary range
All ML Engineering jobs →
OpenAI

Researcher: Agent Post-Training, API & Power-Users

Improve agentic model capabilities for API and power users by designing experiments, building evals from real workflows, and driving post-training interventions from discovery through launch.

295k – 445kSan Francisco, CAML EngineeringHybridRLLLMs
Nuance Labs

Member of Technical Staff — RL Research

Own RL and post-training infrastructure for omni foundation models. Build and scale rollout, reward, and policy systems from 0→1 for real-time audiovisual AI.

300k – 400kSeattle, WAML EngineeringOn-siteRLPPO
Datadog

Staff Applied Scientist - Dashboards

Staff Applied Scientist defining evaluation strategy and quality metrics for Datadog's AI-native Dashboards product. Owns ML/GenAI evaluation systems, builds datasets and harnesses, and drives improvements in retrieval, tool selection, and agent performance.

276k – 345kNew York, NYML EngineeringHybridGenerative AITool Selection
Nuance Labs

Member of Technical Staff — Pretraining Infra

Own and scale the distributed training infrastructure for large-scale omni model pretraining across GPU clusters, covering job orchestration, parallelism, GPU communication, data loading, and performance optimization.

300k – 400kSeattle, WAML EngineeringOn-siteNCCLMegatron
Square

Staff Applied Machine Learning Engineer - Fraud & Abuse

Staff Applied ML Engineer building and operating production ML decision systems to detect and prevent payment fraud, scams, identity abuse, and marketplace risk across Block.

277k – 415kSan Francisco, CAML EngineeringOn-siteSQLJava