Software Engineer, Inference

Develops and optimizes inference engines for multimodal AI models, integrating new architectures, building scheduling systems, and managing large-scale GPU deployments. Requires strong Python, model serving frameworks like PyTorch/vLLM, and Kubernetes expertise.

188k – 395kPalo Alto, CAML EngineeringHybrid

Apply

About the role

Role & Responsibilities

Ship new model architectures by integrating them into our inference engine
Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
Automate, test and maintain our inference services to ensure maximum uptime and reliability
Optimize deployment workflows to scale across thousands of machines
Manage and optimize our inference workloads across different clusters & hardware providers
Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling

Background

Must have:

Strong Python and system architecture skills
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
Experience with queues, scheduling, traffic-control, fleet management at scale
Experience with Linux, Docker, and Kubernetes

Bonus points:

Experience with modern networking stacks, including RDMA (RoCE, Infiniband, NVLink)
Experience with high performance large scale ML systems (>100 GPUs)
Experience with FFmpeg and multimedia processing

Tech stack

Must have:

Python
Redis
S3-compatible Storage
Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)

Nice to have:

CUDA
FFmpeg

Compensation

The base pay range for this role is $187,500 – $395,000 per year.

Skills

PythonPyTorchHuggingfacevLLMSglangTensorrt-LlmKubernetesDockerLinuxRedis

Similar roles

ML Engineering jobs

Luma AI

Software Engineer, ML Platform

Builds foundational ML platform infrastructure including model serving pipelines, GPU scheduling systems, and CI/CD for large-scale multimodal AI models. Requires 5+ years in distributed systems with expertise in Python, Kubernetes, and AWS.

188k – 395kPalo Alto, CAML EngineeringHybrid5+ YOES3AWS

Luma AI

Research Scientist / Engineer – Training Infrastructure

Builds and optimizes distributed training infrastructure for large-scale multimodal AI models across thousands of GPUs. Requires deep expertise in PyTorch, CUDA, parallelization techniques, and GPU clusters.

188k – 395kPalo Alto, CAML EngineeringHybridMpiCUDA

Machine Learning Engineer

Design, train, and deploy large-scale ML recommendation systems and models that power personalization and discovery on Reddit. Requires a Master's degree and 3+ years building production ML systems.

188k – 260kSan Francisco, CAML EngineeringRemote3+ YOES3AWS

Chime

Software Engineer, Machine Learning Platform

Build and operate Chime's ML platform on AWS, including distributed training systems, feature stores, data pipelines, and CI/CD tooling. Partner with ML teams to improve reliability, observability, and developer experience for production models.

187k – 259kSan Francisco, CAML EngineeringHybrid5+ YOEGoAWS

EigenLayer

Agentic AI Engineer

Builds production-ready agentic AI systems including runtimes, orchestration, reliability, observability, and integrations with LLMs/APIs. Requires strong backend experience, shipped agent/LLM systems, and production reliability expertise.

187k – 253kSeattle, WA +1ML EngineeringRemoteGoRust