Software Engineer, ML Performance Optimization
Drive ML performance optimization initiatives to make autonomous driving models faster and more efficient using distributed training, quantization, distillation, and profiling tools.
Responsibilities
- Design, implement, and operate cutting-edge ML Training OR Inference performance optimization techniques to scale VLM, VLA, and Foundational models and deploy them efficiently in robotaxis.
- Collaborate closely with cross-functional teams, including ML researchers, software engineers, data engineers, and hardware engineers, to define requirements and align on architectural decisions.
Requirements
- 4+ years of total experience, including 2+ years of working on large-scale model training or inference platforms.
- Experience with training frameworks like PyTorch, leveraging GPUs efficiently for distributed model training.
- Experience with GPU-accelerated inference using TensorRT or similar frameworks.
- Experience using profiling tools like NVIDIA's Nsight or PyTorch's Profiler for identifying model training and serving bottlenecks.
- Proficient in Python or C++.
Nice-to-Haves
- Experience with distributed training techniques, quantization, distillation, and pruning.
- Work with SOTA accelerators and inference optimization frameworks.
Staff Software Engineer, AI Runtime
Staff Software Engineer building and scaling Databricks' managed large-scale GPU training platform (AIR). Focus on distributed training performance, scheduling, fault tolerance, and developer experience for thousands of accelerators.
Senior Software Engineer, AI Runtime
Senior Software Engineer building and scaling Databricks' managed GPU training platform (AI Runtime) for large-scale distributed AI model training. Requires 5+ years in distributed systems and hands-on experience with GPU training frameworks.
Sr. Machine Learning Engineer, Computer Vision
Build and prototype diffusion-based text-to-image generative models (Pinterest Canvas) using large-scale visual-text datasets. Requires 5+ years industry computer vision experience and an M.S. or Ph.D.
Senior AI/ML Engineer
Senior AI/ML Engineer building transformer and deep learning models on financial and behavioral data to power personalized growth and marketing experiences at Chime. Requires strong production ML experience with PyTorch, AWS, and large-scale data infrastructure.