Skip to content

Distributed LLM Inference Engineer

Build and optimize distributed LLM inference systems at scale using Ray, integrating with engines like vLLM to deliver high-throughput, low-latency batch and online inference solutions.

170k – 247kSan Francisco, CAPalo Alto, CACaliforniaML EngineeringHybrid

About the role

Responsibilities

  • Iterate quickly with product teams to ship end-to-end solutions for batch and online inference at high scale for Ray users and Anyscale customers
  • Work across the stack integrating Ray Data and LLM engines to provide optimizations for low-cost, large-scale ML inference
  • Integrate with open-source software like vLLM, work with the community to adopt techniques in Anyscale solutions, and contribute improvements to open source
  • Follow state-of-the-art developments in open source and research, implementing and extending best practices

Requirements

  • Familiarity with running ML inference at large scale with high throughput and low latency
  • Familiarity with deep learning and deep learning frameworks (e.g., PyTorch)
  • Solid understanding of distributed systems and ML inference challenges

Nice-to-Haves

  • ML Systems knowledge
  • Experience using Ray
  • Work with community on LLM engines like vLLM, TensorRT-LLM
  • Contributions to deep learning frameworks (PyTorch, TensorFlow)
  • Contributions to deep learning compilers (Triton, TVM, MLIR)
  • Prior experience working on GPUs / CUDA

Compensation & Benefits

  • Market-based compensation approach
  • Equity (stock options)
  • Healthcare plans with 99% premiums covered for employees and dependents
  • 401k Retirement Plan
  • Education & Wellbeing Stipend
  • Paid Parental Leave
  • Fertility Benefits
  • Paid Time Off
  • Commute reimbursement
  • 100% of in-office meals covered

Skills

PyTorchRayvLLMTensorrt-LlmDistributed SystemsMl InferenceCUDATritonTvmMlirTensorFlow

Similar roles

ML Engineering jobs

Machine Learning Engineer

Build and deploy cutting-edge Agentic AI and LLM systems to transform Airbnb's customer service experience. Requires PhD or equivalent experience and production ML/AI deployment expertise.

170k – 180kSan Francisco, CA +1ML EngineeringOn-site3+ YOESftRAG

AI Platform Engineer

Design, build, and maintain LLM integrations powering AI features. Own end-to-end delivery from requirements through production monitoring with focus on scalability and reliability.

170k – 205kSan Mateo, CAML EngineeringHybrid5+ YOEGCPGKE

Autonomy Engineer - ML & DL Infrastructure

Builds and scales ML/DL infrastructure including data pipelines, annotation workflows, training, deployment, and monitoring for autonomous drone systems. Requires hands-on experience in data engineering, cloud ML platforms, containerization, and MLOps.

170k – 278kSan Mateo, CAML EngineeringHybridMl OpsDocker

Software Engineer (Applied AI)

Owns end-to-end AI systems for relationship intelligence platform, building multi-agent LLM pipelines, classical ML models for ranking/entity resolution, evals, and data infrastructure to analyze deals and conversations for enterprise revenue teams. Requires production experience shipping LLM/ML products with strong backend skills.

170k – 220kSan Francisco, CA +1ML EngineeringRemoteRAGAWS

Research Engineer

Research Engineer building production LLM and ML systems for healthcare workflows. Requires strong ML/NLP research background with publications, production deployment experience, and proficiency in PyTorch/TensorFlow/JAX.

170k – 300kNew York, NYML EngineeringOn-siteJAXNLP