Distributed LLM Inference Engineer

Build and optimize distributed LLM inference systems at scale using Ray, integrating with engines like vLLM to deliver high-throughput, low-latency batch and online inference solutions.

170k – 247kSan Francisco, CAPalo Alto, CACaliforniaML EngineeringHybrid

Apply

About the role

Responsibilities

Iterate quickly with product teams to ship end-to-end solutions for batch and online inference at high scale for Ray users and Anyscale customers
Work across the stack integrating Ray Data and LLM engines to provide optimizations for low-cost, large-scale ML inference
Integrate with open-source software like vLLM, work with the community to adopt techniques in Anyscale solutions, and contribute improvements to open source
Follow state-of-the-art developments in open source and research, implementing and extending best practices

Requirements

Familiarity with running ML inference at large scale with high throughput and low latency
Familiarity with deep learning and deep learning frameworks (e.g., PyTorch)
Solid understanding of distributed systems and ML inference challenges

Nice-to-Haves

ML Systems knowledge
Experience using Ray
Work with community on LLM engines like vLLM, TensorRT-LLM
Contributions to deep learning frameworks (PyTorch, TensorFlow)
Contributions to deep learning compilers (Triton, TVM, MLIR)
Prior experience working on GPUs / CUDA

Compensation & Benefits

Market-based compensation approach
Equity (stock options)
Healthcare plans with 99% premiums covered for employees and dependents
401k Retirement Plan
Education & Wellbeing Stipend
Paid Parental Leave
Fertility Benefits
Paid Time Off
Commute reimbursement
100% of in-office meals covered

Skills

PyTorchRayvLLMTensorrt-LlmDistributed SystemsMl InferenceCUDATritonTvmMlirTensorFlow

Similar roles

ML Engineering jobs

Airbnb

Machine Learning Engineer

Build and deploy cutting-edge Agentic AI and LLM systems to transform Airbnb's customer service experience. Requires PhD or equivalent experience and production ML/AI deployment expertise.

170k – 180kSan Francisco, CA +1ML EngineeringOn-site3+ YOESftRAG

Notable

AI Platform Engineer

Design, build, and maintain LLM integrations powering AI features. Own end-to-end delivery from requirements through production monitoring with focus on scalability and reliability.

170k – 205kSan Mateo, CAML EngineeringHybrid5+ YOEGCPGKE

Skydio

Autonomy Engineer - ML & DL Infrastructure

Builds and scales ML/DL infrastructure including data pipelines, annotation workflows, training, deployment, and monitoring for autonomous drone systems. Requires hands-on experience in data engineering, cloud ML platforms, containerization, and MLOps.

170k – 278kSan Mateo, CAML EngineeringHybridMl OpsDocker

Centralize

Software Engineer (Applied AI)

Owns end-to-end AI systems for relationship intelligence platform, building multi-agent LLM pipelines, classical ML models for ranking/entity resolution, evals, and data infrastructure to analyze deals and conversations for enterprise revenue teams. Requires production experience shipping LLM/ML products with strong backend skills.

170k – 220kSan Francisco, CA +1ML EngineeringRemoteRAGAWS

Forus

Research Engineer

Research Engineer building production LLM and ML systems for healthcare workflows. Requires strong ML/NLP research background with publications, production deployment experience, and proficiency in PyTorch/TensorFlow/JAX.

170k – 300kNew York, NYML EngineeringOn-siteJAXNLP