Training: ML Framework Engineer
Develops and optimizes internal distributed ML training framework to boost hardware efficiency and enable researchers to experiment with new AI models. Requires strong Python skills, systems understanding, and passion for performance tuning.
Responsibilities
- Apply the latest techniques in our internal training framework to achieve impressive hardware efficiency for our training runs
- Profile and optimize our training framework
- Work with researchers to enable them to develop the next generation of models
Requirements
- Have run small scale ML experiments
- Love figuring out how systems work and continuously come up with ideas for how to make them faster while minimizing complexity and maintenance burden
- Have strong software engineering skills and are proficient in Python
Staff Software Engineer, AI Runtime
Staff Software Engineer building and scaling Databricks' managed large-scale GPU training platform (AIR). Focus on distributed training performance, scheduling, fault tolerance, and developer experience for thousands of accelerators.
Senior Staff Machine Learning Engineer, Communication & Connectivity
Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.
Senior AI/ML Engineer
Senior AI/ML Engineer building transformer and deep learning models on financial and behavioral data to power personalized growth and marketing experiences at Chime. Requires strong production ML experience with PyTorch, AWS, and large-scale data infrastructure.
Staff Software Engineer
Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.