Software Engineer, Inference - Performance Optimization

295k – 555kSan Francisco, CAOnsiteApr 25

Summary

Models inference performance across application, model, and fleet layers using microbenchmarks to build cost-to-serve estimates. Analyzes workloads end-to-end, enhances bottleneck detection tools, and collaborates on optimizations for latency, throughput, and cost.

About the role

Responsibilities

Build and refine performance models that translate microbenchmark results into cost-to-serve estimates.
Analyze inference workloads end to end across applications, models, and fleet infrastructure.
Enhance tooling to identify bottlenecks across layers for latency and throughput.
Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference.

Requirements

Enjoy reasoning from first principles about distributed systems, model inference, and hardware efficiency.
Comfortable working across abstraction layers, from application behavior to kernels, accelerators, networking, and fleet scheduling.
Deep expertise with performance profiling, benchmarking, analysis, and optimization.
Enjoy collaborating with engineering and research teams to improve real production systems.

Skills

performance modelingprofilingbenchmarkingdistributed systemsmodel inferencehardware optimizationkernelsacceleratorsnetworkingfleet scheduling

Similar roles at this salary range

All ML Engineering jobs →

Anthropic

Jun 8

Staff Software Engineer, Inference

Build and maintain distributed inference systems serving Claude to millions of users. Design intelligent routing, autoscaling, and high-performance infrastructure across diverse AI accelerators.

320k – 485kSan Francisco, CA +2ML EngineeringHybridAWSGCP

Airbnb

Jun 8

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems

Traba

Jun 8

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka

Nuance Labs

Jun 5

Member of Technical Staff — Model Optimization and Inference

Optimize inference for real-time multimodal AI avatars. Specialize in LLM and diffusion model serving, KV cache strategies, quantization, and low-latency frameworks like vLLM and TensorRT-LLM.

250k – 350kSeattle, WAML EngineeringOn-siteAWQvLLM

OpenAI

Jun 5

Researcher: Agent Post-Training, API & Power-Users

Improve agentic model capabilities for API and power users by designing experiments, building evals from real workflows, and driving post-training interventions from discovery through launch.

295k – 445kSan Francisco, CAML EngineeringHybridRLLLMs

Apply