Software Engineer, ML Platform

Builds foundational ML platform infrastructure including model serving pipelines, GPU scheduling systems, and CI/CD for large-scale multimodal AI models. Requires 5+ years in distributed systems with expertise in Python, Kubernetes, and AWS.

188k – 395kPalo Alto, CAML EngineeringHybrid5+ YOE

Apply

About the role

What You'll Do

Architect end-to-end model serving pipelines and integrate new model architectures from our research team into our core, high-throughput inference engine.
Build robust and sophisticated scheduling systems to manage jobs based on cluster availability and user priority, ensuring we optimally leverage thousands of expensive GPU resources.
Design and implement dynamic, traffic-based systems for hotswapping models on our GPU workers to maximize fleet efficiency and meet product SLOs.
Own the end-to-end CI/CD pipelines, including creating a resilient artifact store to manage all model checkpoints across multiple versions and providers.
Develop and maintain user-friendly APIs and interaction patterns that empower our product and research teams to ship groundbreaking features at high velocity.
Manage and optimize our complex inference workloads at scale, operating across multiple clusters and hardware providers.

Who You Are

We are looking for a world-class builder who has a proven history of creating and managing large-scale, high-performance systems. You are a non-negotiable fit if you have:

5+ years of professional engineering experience with deep, hands-on proficiency in Python and complex distributed systems architecture.
Extensive, practical experience building and managing systems at scale, specifically with queues, scheduling, traffic-control, and fleet management.
Deep expertise in our core infrastructure stack: Linux, Docker, and Kubernetes.
Strong experience with Redis, S3-compatible storage, and public cloud platforms (AWS).

What Sets You Apart (Bonus Points)

Experience with high-performance, large-scale ML systems (managing >100 GPUs).
Deep familiarity with PyTorch and CUDA.
Experience with modern networking stacks, including RDMA (RoCE, Infiniband, NVLink).
Familiarity with FFmpeg and multimedia processing pipelines.

Compensation

The base pay range for this role is $187,500 – $395,000 per year.

Skills

PythonKubernetesDockerLinuxRedisAWSPyTorchCUDAS3Rdma

Similar roles

ML Engineering jobs

Luma AI

Software Engineer, Inference

Develops and optimizes inference engines for multimodal AI models, integrating new architectures, building scheduling systems, and managing large-scale GPU deployments. Requires strong Python, model serving frameworks like PyTorch/vLLM, and Kubernetes expertise.

188k – 395kPalo Alto, CAML EngineeringHybridvLLMLinux

Luma AI

Research Scientist / Engineer – Training Infrastructure

Builds and optimizes distributed training infrastructure for large-scale multimodal AI models across thousands of GPUs. Requires deep expertise in PyTorch, CUDA, parallelization techniques, and GPU clusters.

188k – 395kPalo Alto, CAML EngineeringHybridMpiCUDA

Machine Learning Engineer

Design, train, and deploy large-scale ML recommendation systems and models that power personalization and discovery on Reddit. Requires a Master's degree and 3+ years building production ML systems.

188k – 260kSan Francisco, CAML EngineeringRemote3+ YOES3AWS

Chime

Software Engineer, Machine Learning Platform

Build and operate Chime's ML platform on AWS, including distributed training systems, feature stores, data pipelines, and CI/CD tooling. Partner with ML teams to improve reliability, observability, and developer experience for production models.

187k – 259kSan Francisco, CAML EngineeringHybrid5+ YOEGoAWS

EigenLayer

Agentic AI Engineer

Builds production-ready agentic AI systems including runtimes, orchestration, reliability, observability, and integrations with LLMs/APIs. Requires strong backend experience, shipped agent/LLM systems, and production reliability expertise.

187k – 253kSeattle, WA +1ML EngineeringRemoteGoRust