Skip to content

Machine Learning Engineer, Distributed Data Systems

295k – 445kSan Francisco, CAHybrid
Summary

Designs and scales distributed data infrastructure for large-scale multimodal AI training and evaluation. Collaborates with researchers to build reliable, high-performance systems in a fast-paced environment.

About the role

In this role, you will:

  • Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security.
  • Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient.
  • Partner with researchers to deeply understand requirements and translate them into production-ready systems.
  • Harden, optimize, and maintain critical data infrastructure systems that power multimodal training and evaluation.

You might thrive in this role if you:

  • Have strong experience with distributed systems and large-scale infrastructure with a strong interest in data.
  • Are detail-oriented and bring rigor to building and maintaining reliable systems.
  • Demonstrate excellent software engineering fundamentals and organizational skills.
  • Are comfortable with ambiguity and rapid change.
Skills
Distributed SystemsMachine Learning InfrastructureData OrchestrationDistributed StorageStreaming InfrastructureDistributed ComputeScalable Data Platforms
Similar roles at this salary range
All ML Engineering jobs →
Anthropic

Staff Software Engineer, Inference

Build and maintain distributed inference systems serving Claude to millions of users. Design intelligent routing, autoscaling, and high-performance infrastructure across diverse AI accelerators.

320k – 485kSan Francisco, CA +2ML EngineeringHybridAWSGCP
Airbnb

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems
Traba

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka
Nuance Labs

Member of Technical Staff — Model Optimization and Inference

Optimize inference for real-time multimodal AI avatars. Specialize in LLM and diffusion model serving, KV cache strategies, quantization, and low-latency frameworks like vLLM and TensorRT-LLM.

250k – 350kSeattle, WAML EngineeringOn-siteAWQvLLM
OpenAI

Researcher: Agent Post-Training, API & Power-Users

Improve agentic model capabilities for API and power users by designing experiments, building evals from real workflows, and driving post-training interventions from discovery through launch.

295k – 445kSan Francisco, CAML EngineeringHybridRLLLMs