Skip to content

Senior Software Engineer, Model Serving

Designs and builds scalable infrastructure for high-throughput, low-latency AI/ML model serving on CPU/GPU. Requires 5+ years in distributed systems, inference expertise, and strong system design skills.

166k – 225kSan Francisco, CAML EngineeringOnsite5+ YOE

About the role

The impact you will have:

  • Design and implement core systems and APIs that power Databricks Model Serving, ensuring scalability, reliability, and operational excellence.
  • Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for CPU and GPU serving workloads.
  • Contribute directly to key components across the serving infrastructure — from model container builds and deployment workflows to runtime systems like routing, caching, observability, and intelligent autoscaling — ensuring smooth and efficient operations at scale.
  • Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.
  • Lead technical initiatives that improve latency, availability, and cost-effectiveness across both customer-facing and foundational serving layers.
  • Establish best practices for code quality, testing, and operational readiness, and mentor other engineers through design reviews and technical guidance.

What we look for:

  • 5+ years of experience building and operating large-scale distributed systems.
  • Experience in model serving, inference systems, or related infrastructure (e.g., routing, scheduling, autoscaling, and observability).
  • Strong foundation in algorithms, data structures, and system design as applied to large-scale, low-latency serving systems.
  • Proven ability to deliver technically complex, high-impact initiatives that create measurable customer or business value.
  • Experience building architecture for large-scale, performance-sensitive CPU/GPU inference systems.
  • Strong communication skills and ability to collaborate across teams in fast-moving environments.
  • Customer-focused mindset with the ability to align implementation details with product goals.
  • Passion for mentoring, growing engineers, and fostering technical excellence.

Skills

Distributed SystemsModel ServingInference SystemsSystem DesignKubernetesAutoscalingObservabilityRoutingCachingGPU

Similar roles

ML Engineering jobs

Senior Machine Learning Engineer - GenAI Platform

Build customer-facing generative AI platform covering ML lifecycle from data generation to agent-building. Requires 4+ years experience in distributed systems and ML platforms, with strong product ownership.

166k – 225kSan Francisco, CAML EngineeringOn-site4+ YOEGoC++

Senior Machine Learning Operations Engineer

Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.

167k – 208kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLShap

Senior Applied Research Engineer

Senior Applied Research Engineer driving AI system quality through experimentation and evaluation of RAG, retrieval, and reasoning systems. Requires 5+ years applied ML/NLP experience with strong Python and evaluation methodology skills.

167k – 226kSan Francisco, CAML EngineeringHybrid5+ YOERAGNLP

Senior AI Engineer, Agent Harness

Senior AI Engineer to design, build, and scale agentic AI systems using LLMs for compliance automation. Own end-to-end development of production LLM + retrieval + agent workflows with focus on responsible AI.

167k – 226kSan Francisco, CAML EngineeringHybrid5+ YOERAGLLMs

Senior GenAI Software Engineer

Senior engineer building and shipping production-grade GenAI systems for ad creative generation, including multimodal models and interactive playables. Requires 5+ years experience, strong Python/JS skills, and proven LLM production experience.

165k – 230kUnited StatesML EngineeringRemote5+ YOELLMsHTML