# Software Engineer, Inference
**Company:** [Luma AI](https://hotfix.jobs/companies/lumalabs-ai)
**Location:** Palo Alto, CA
**Salary:** $188K-$395K
**Skills:** Python, PyTorch, Huggingface, vLLM, Sglang, Tensorrt-Llm, Kubernetes, Docker, Linux, Redis
**Posted:** 2026-01-21
> Develops and optimizes inference engines for multimodal AI models, integrating new architectures, building scheduling systems, and managing large-scale GPU deployments. Requires strong Python, model serving frameworks like PyTorch/vLLM, and Kubernetes expertise.
## Job Description
## Role & Responsibilities
- Ship new model architectures by integrating them into our inference engine
- Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
- Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
- Automate, test and maintain our inference services to ensure maximum uptime and reliability
- Optimize deployment workflows to scale across thousands of machines
- Manage and optimize our inference workloads across different clusters & hardware providers
- Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
- Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling

## Background
**Must have:**
- Strong Python and system architecture skills
- Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
- Experience with queues, scheduling, traffic-control, fleet management at scale
- Experience with Linux, Docker, and Kubernetes

**Bonus points:**
- Experience with modern networking stacks, including RDMA (RoCE, Infiniband, NVLink)
- Experience with high performance large scale ML systems (>100 GPUs)
- Experience with FFmpeg and multimedia processing

## Tech stack
**Must have:**
- Python
- Redis
- S3-compatible Storage
- Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
- Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)

**Nice to have:**
- CUDA
- FFmpeg

## Compensation
The base pay range for this role is $187,500 – $395,000 per year.
**Apply:** https://hotfix.jobs/jobs/software-engineer-inference-at-lumalabs-ai-c31a9085-9d3a-4515-b43b-42516d45d8f1
**Canonical:** https://hotfix.jobs/jobs/software-engineer-inference-at-lumalabs-ai-c31a9085-9d3a-4515-b43b-42516d45d8f1