Senior Software Engineer - ML Infrastructure
Builds distributed ML infrastructure including GPU training, end-to-end pipelines, and deployment platforms. Requires 3+ years experience in production ML systems, strong software engineering, and familiarity with open-source tools.
Responsibilities
- Design and implement distributed cloud GPU training approaches for deep learning model training and evaluation
- Build end-to-end machine learning pipelines and integrate them into core product workflows
- Encourage change, especially in support of ML engineering best practices, and maintain a high standard of excellence
- Collaborate with engineers across the entire company to solve complex data problems at scale
Requirements
- Bachelor's degree in Computer Science, Software Engineering, or equivalent
- 3+ years of professional experience
- Experience with building software components to address production, full-stack machine learning challenges
- Opinions about building a company-wide platform for ML training, evaluation, and deployment
- Knowledge of the open source landscape with judgment on when to choose open source versus build in-house
- Excellent analytical and problem-solving skills
Nice to Have
- Experience with developing, running, and managing orchestration systems like Airflow and Flyte that non-engineers can use to build data pipelines
- Experience with ML modeling frameworks (PyTorch, Tensorflow, etc.), and model serving platforms (TorchServe, TensorFlow Serving, NVIDIA Triton inference server, etc.)
Compensation
- Base salary range: $153,000 - $222,000 USD annually
- Equity, comprehensive health/dental/vision insurance, 401k with employer match, learning/wellness stipends, paid time off
Senior Software Engineer, AI Runtime
Senior Software Engineer building and scaling Databricks' managed GPU training platform (AI Runtime) for large-scale distributed AI model training. Requires 5+ years in distributed systems and hands-on experience with GPU training frameworks.
Sr. Machine Learning Engineer, Computer Vision
Build and prototype diffusion-based text-to-image generative models (Pinterest Canvas) using large-scale visual-text datasets. Requires 5+ years industry computer vision experience and an M.S. or Ph.D.
Senior AI/ML Engineer
Senior AI/ML Engineer building transformer and deep learning models on financial and behavioral data to power personalized growth and marketing experiences at Chime. Requires strong production ML experience with PyTorch, AWS, and large-scale data infrastructure.