# Distributed LLM Inference Engineer
**Company:** [Anyscale](https://hotfix.jobs/companies/anyscale)
**Location:** San Francisco, CA, Palo Alto, CA, California
**Salary:** $170K-$247K
**Skills:** PyTorch, Ray, vLLM, Tensorrt-Llm, Distributed Systems, Ml Inference, CUDA, Triton, Tvm, Mlir, TensorFlow
**Posted:** 2026-05-27
> Build and optimize distributed LLM inference systems at scale using Ray, integrating with engines like vLLM to deliver high-throughput, low-latency batch and online inference solutions.
## Job Description
## Responsibilities
- Iterate quickly with product teams to ship end-to-end solutions for batch and online inference at high scale for Ray users and Anyscale customers
- Work across the stack integrating Ray Data and LLM engines to provide optimizations for low-cost, large-scale ML inference
- Integrate with open-source software like vLLM, work with the community to adopt techniques in Anyscale solutions, and contribute improvements to open source
- Follow state-of-the-art developments in open source and research, implementing and extending best practices

## Requirements
- Familiarity with running ML inference at large scale with high throughput and low latency
- Familiarity with deep learning and deep learning frameworks (e.g., PyTorch)
- Solid understanding of distributed systems and ML inference challenges

## Nice-to-Haves
- ML Systems knowledge
- Experience using Ray
- Work with community on LLM engines like vLLM, TensorRT-LLM
- Contributions to deep learning frameworks (PyTorch, TensorFlow)
- Contributions to deep learning compilers (Triton, TVM, MLIR)
- Prior experience working on GPUs / CUDA

## Compensation & Benefits
- Market-based compensation approach
- Equity (stock options)
- Healthcare plans with 99% premiums covered for employees and dependents
- 401k Retirement Plan
- Education & Wellbeing Stipend
- Paid Parental Leave
- Fertility Benefits
- Paid Time Off
- Commute reimbursement
- 100% of in-office meals covered
**Apply:** https://hotfix.jobs/jobs/distributed-llm-inference-engineer-at-anyscale-d33066df-7725-4fc7-9306-c2a1eacc4ef1
**Canonical:** https://hotfix.jobs/jobs/distributed-llm-inference-engineer-at-anyscale-d33066df-7725-4fc7-9306-c2a1eacc4ef1