Skip to content

ML Model Serving Engineer

175k – 280kSan Francisco, CANew York, NYBellevue, WAML EngineeringOnsite
Summary

Optimizes and extends ML model serving infrastructure for LLMs, speech, and vision models, focusing on high-throughput, low-latency inference using frameworks like VLLM and SGLang. Requires deep PyTorch expertise, systems programming, and performance engineering for reliable production deployment.

About the role

Responsibilities

  • Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models.
  • Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category.
  • Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving.
  • Work with the training team to identify opportunities to produce faster models without sacrificing quality.
  • Use techniques like in-flight batching, caching, and custom kernels to speed up inference.
  • Find ways to reduce model initialization times without sacrificing quality.

Required Qualifications

  • Expert in some differentiable array computing framework, preferably PyTorch.
  • Expert in optimizing machine learning models for serving reliably at high throughput, with low latency.
  • Significant systems programming experience (e.g., working on high-performance server systems—comfortable with the internals of VLLM as with a complex PyTorch codebase).
  • Significant performance engineering experience (e.g., bottleneck analysis in high-scale server systems or profiling low-level systems code).
  • Always up to date on the latest techniques for model serving optimization.

Preferred Qualifications

  • Familiarity with high-performance LLM serving (e.g., experience with VLLM, SGlang deployment, and internals).
  • Experience with a public cloud platform such as GCP, AWS, or Azure.
  • Experience deploying and scaling inference workloads in the cloud using Kubernetes, Ray, etc.
  • Track record of leading complex multi-month projects without assistance.

Benefits

  • 401(k) max employer match: 3.5% of compensation
  • 100% employer-paid health, vision, and dental benefits for you and your dependents
  • Unlimited PTO and sick time
  • Flexible spending account with employer matching up to $1,650/year (medical FSA)
  • Guardian Employee Assistance Program (EAP)
  • Competitive stock options
Skills
PyTorchVLLMSGLangKubernetesRayGCPAWSAzureLLM servingperformance optimization
Similar roles at this salary range
All ML Engineering jobs →
Mem0

Senior Research Engineer

Own the end-to-end lifecycle of memory features for AI agents. Fine-tune models, implement research, build evaluations, and ship production systems with Engineering.

175k – 250kSan Francisco, CAML EngineeringOn-site7+ YOERAGvLLM
Ironclad

Senior Software Engineer, AI

Lead design and delivery of high-priority AI initiatives across multiple codebases. Build and ship AI-powered features with strong backend fundamentals and product sense.

180k – 220kSan Francisco, CAML EngineeringHybrid5+ YOEReactEvals
Mercury

Senior Machine Learning Operations Engineer

Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.

167k – 208kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLSHAP
Distyl AI

AI Engineer, Evaluation

Design and implement evaluation frameworks and pipelines for AI systems using Evaluation-Driven Development. Build Python-based test suites, LLM graders, and measurement systems that guide prompt iteration and production deployment decisions.

150k – 250kSan Francisco, CA +1ML EngineeringHybrid2+ YOEPythonAI Systems
Grafana Labs

Senior AI Engineer

Senior Engineer building multi-agent AI systems, LLM integrations, and backend automation services that power Marketing Operations. Owns technical direction for agentic infrastructure connecting models to business systems.

154k – 185kUnited StatesML EngineeringRemote8+ YOERAGGit