Staff Software Engineer, Inference
Build and maintain distributed inference systems serving Claude to millions of users. Design intelligent routing, autoscaling, and high-performance infrastructure across diverse AI accelerators.
Key Responsibilities
- Design, build, and maintain the distributed systems that serve Claude to millions of users worldwide
- Develop intelligent request routing, load balancing, and traffic management systems across thousands of accelerators
- Maximize compute efficiency across the fleet by autoscaling and orchestrating production, research, and experimental workloads
- Build and operate production-grade deployment pipelines for releasing new models to users
- Provide high-performance inference infrastructure that enables researchers to develop next-generation models
- Integrate new AI accelerator platforms and support inference for new model architectures
- Use observability data to tune and improve performance based on real-world production workloads
Representative Projects
- Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators
- Autoscaling compute fleet to dynamically match supply with demand across production, research, and experimental workloads
- Building production-grade deployment pipelines for releasing new models to millions of users
- Integrating new AI accelerator platforms to maintain hardware-agnostic competitive advantage
- Contributing to new inference features (e.g., structured sampling, prompt caching)
- Supporting inference for new model architectures
- Analyzing observability data to tune performance based on real-world production workloads
- Managing multi-region deployments and geographic routing for global customers
Minimum Qualifications
- Significant software engineering experience, particularly with distributed systems
- Results-oriented, with a bias towards flexibility and impact
- Willingness to pick up slack, even if it goes outside your job description
- Enjoy pair programming
- Desire to learn more about machine learning systems and infrastructure
- Thrive in environments where technical excellence directly drives both business results and research breakthroughs
- Care about the societal impacts of your work
Preferred Qualifications
- Experience with high-performance, large-scale distributed systems
- Experience implementing and deploying machine learning systems at scale
- Experience with load balancing, request routing, or traffic management systems
- Familiarity with LLM inference optimization, batching, and caching strategies
- Experience with Kubernetes and cloud infrastructure (AWS, GCP, Azure)
- Proficiency in Python or Rust
Logistics
- Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
- Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
- Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
- Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.
Researcher: Agent Post-Training, API & Power-Users
Improve agentic model capabilities for API and power users by designing experiments, building evals from real workflows, and driving post-training interventions from discovery through launch.
Staff Applied Scientist - Dashboards
Staff Applied Scientist defining evaluation strategy and quality metrics for Datadog's AI-native Dashboards product. Owns ML/GenAI evaluation systems, builds datasets and harnesses, and drives improvements in retrieval, tool selection, and agent performance.
Member of Technical Staff — Pretraining Infra
Own and scale the distributed training infrastructure for large-scale omni model pretraining across GPU clusters, covering job orchestration, parallelism, GPU communication, data loading, and performance optimization.