Skip to content

Training: ML Framework Engineer

205k – 445kSan Francisco, CAHybrid
Summary

Develops and optimizes internal distributed ML training framework to boost hardware efficiency and enable researchers to experiment with new AI models. Requires strong Python skills, systems understanding, and passion for performance tuning.

About the role

Responsibilities

  • Apply the latest techniques in our internal training framework to achieve impressive hardware efficiency for our training runs
  • Profile and optimize our training framework
  • Work with researchers to enable them to develop the next generation of models

Requirements

  • Have run small scale ML experiments
  • Love figuring out how systems work and continuously come up with ideas for how to make them faster while minimizing complexity and maintenance burden
  • Have strong software engineering skills and are proficient in Python
Skills
PythonDistributed SystemsMachine LearningPyTorchTensorFlowGPU ProgrammingPerformance OptimizationProfilingSupercomputing
Similar roles at this salary range
All ML Engineering jobs →
Databricks

Staff Software Engineer, AI Runtime

Staff Software Engineer building and scaling Databricks' managed large-scale GPU training platform (AIR). Focus on distributed training performance, scheduling, fault tolerance, and developer experience for thousands of accelerators.

190k – 265kMountain View, CA +1ML EngineeringOn-siteFSDPRoCE
Airbnb

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems
Checkr

Machine Learning Engineer

Build and ship production ML/AI services powering background checks. Own end-to-end ML systems using LLMs, Python, and modern MLOps practices.

168k – 198kSan Francisco, CAML EngineeringOn-siteNLPdbt
Chime

Senior AI/ML Engineer

Senior AI/ML Engineer building transformer and deep learning models on financial and behavioral data to power personalized growth and marketing experiences at Chime. Requires strong production ML experience with PyTorch, AWS, and large-scale data infrastructure.

172k – 238kChicago, IL +3ML EngineeringHybridSQLAWS
Traba

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka