Senior Software Engineer, Model Serving

Designs and builds scalable infrastructure for high-throughput, low-latency AI/ML model serving on CPU/GPU. Requires 5+ years in distributed systems, inference expertise, and strong system design skills.

166k – 225kSan Francisco, CAML EngineeringOnsite5+ YOE

Apply

About the role

The impact you will have:

Design and implement core systems and APIs that power Databricks Model Serving, ensuring scalability, reliability, and operational excellence.
Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for CPU and GPU serving workloads.
Contribute directly to key components across the serving infrastructure — from model container builds and deployment workflows to runtime systems like routing, caching, observability, and intelligent autoscaling — ensuring smooth and efficient operations at scale.
Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.
Lead technical initiatives that improve latency, availability, and cost-effectiveness across both customer-facing and foundational serving layers.
Establish best practices for code quality, testing, and operational readiness, and mentor other engineers through design reviews and technical guidance.

What we look for:

5+ years of experience building and operating large-scale distributed systems.
Experience in model serving, inference systems, or related infrastructure (e.g., routing, scheduling, autoscaling, and observability).
Strong foundation in algorithms, data structures, and system design as applied to large-scale, low-latency serving systems.
Proven ability to deliver technically complex, high-impact initiatives that create measurable customer or business value.
Experience building architecture for large-scale, performance-sensitive CPU/GPU inference systems.
Strong communication skills and ability to collaborate across teams in fast-moving environments.
Customer-focused mindset with the ability to align implementation details with product goals.
Passion for mentoring, growing engineers, and fostering technical excellence.

Skills

Distributed SystemsModel ServingInference SystemsSystem DesignKubernetesAutoscalingObservabilityRoutingCachingGPU

Similar roles

ML Engineering jobs

Databricks

Senior Machine Learning Engineer - GenAI Platform

Build customer-facing generative AI platform covering ML lifecycle from data generation to agent-building. Requires 4+ years experience in distributed systems and ML platforms, with strong product ownership.

166k – 225kSan Francisco, CAML EngineeringOn-site4+ YOEGoC++

Mercury

Senior Machine Learning Operations Engineer

Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.

167k – 208kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLShap

Drata

Senior Applied Research Engineer

Senior Applied Research Engineer driving AI system quality through experimentation and evaluation of RAG, retrieval, and reasoning systems. Requires 5+ years applied ML/NLP experience with strong Python and evaluation methodology skills.

167k – 226kSan Francisco, CAML EngineeringHybrid5+ YOERAGNLP

Drata

Senior AI Engineer, Agent Harness

Senior AI Engineer to design, build, and scale agentic AI systems using LLMs for compliance automation. Own end-to-end development of production LLM + retrieval + agent workflows with focus on responsible AI.

167k – 226kSan Francisco, CAML EngineeringHybrid5+ YOERAGLLMs

Liftoff

Senior GenAI Software Engineer

Senior engineer building and shipping production-grade GenAI systems for ad creative generation, including multimodal models and interactive playables. Requires 5+ years experience, strong Python/JS skills, and proven LLM production experience.

165k – 230kUnited StatesML EngineeringRemote5+ YOELLMsHTML