Skip to content

Senior Software Engineer - ML Infrastructure

153k – 222kSunnyvale, CAOnsite3+ YOE
Summary

Builds distributed ML infrastructure including GPU training, end-to-end pipelines, and deployment platforms. Requires 3+ years experience in production ML systems, strong software engineering, and familiarity with open-source tools.

About the role

Responsibilities

  • Design and implement distributed cloud GPU training approaches for deep learning model training and evaluation
  • Build end-to-end machine learning pipelines and integrate them into core product workflows
  • Encourage change, especially in support of ML engineering best practices, and maintain a high standard of excellence
  • Collaborate with engineers across the entire company to solve complex data problems at scale

Requirements

  • Bachelor's degree in Computer Science, Software Engineering, or equivalent
  • 3+ years of professional experience
  • Experience with building software components to address production, full-stack machine learning challenges
  • Opinions about building a company-wide platform for ML training, evaluation, and deployment
  • Knowledge of the open source landscape with judgment on when to choose open source versus build in-house
  • Excellent analytical and problem-solving skills

Nice to Have

  • Experience with developing, running, and managing orchestration systems like Airflow and Flyte that non-engineers can use to build data pipelines
  • Experience with ML modeling frameworks (PyTorch, Tensorflow, etc.), and model serving platforms (TorchServe, TensorFlow Serving, NVIDIA Triton inference server, etc.)

Compensation

  • Base salary range: $153,000 - $222,000 USD annually
  • Equity, comprehensive health/dental/vision insurance, 401k with employer match, learning/wellness stipends, paid time off
Skills
PyTorchTensorFlowAirflowFlyteKubernetesGPUDistributed TrainingMachine Learning PipelinesTorchServeNVIDIA Triton
Similar roles at this salary range
All ML Engineering jobs →
Databricks

Senior Software Engineer, AI Runtime

Senior Software Engineer building and scaling Databricks' managed GPU training platform (AI Runtime) for large-scale distributed AI model training. Requires 5+ years in distributed systems and hands-on experience with GPU training frameworks.

160k – 225kMountain View, CA +1ML EngineeringOn-siteFSDPRoCE
Pinterest

Sr. Machine Learning Engineer, Computer Vision

Build and prototype diffusion-based text-to-image generative models (Pinterest Canvas) using large-scale visual-text datasets. Requires 5+ years industry computer vision experience and an M.S. or Ph.D.

161k – 332kSan Francisco, CAML EngineeringRemoteRLHFPyTorch
Chime

AI/ML Engineer

Build and productionize ML models for risk detection and decisioning systems. Requires 1-2 years applied ML experience and familiarity with AWS, model evaluation, and experimentation.

125k – 173kSan Francisco, CAML EngineeringHybridAWSPython
Checkr

Machine Learning Engineer

Build and ship production ML/AI services powering background checks. Own end-to-end ML systems using LLMs, Python, and modern MLOps practices.

168k – 198kSan Francisco, CAML EngineeringOn-siteNLPdbt
Chime

Senior AI/ML Engineer

Senior AI/ML Engineer building transformer and deep learning models on financial and behavioral data to power personalized growth and marketing experiences at Chime. Requires strong production ML experience with PyTorch, AWS, and large-scale data infrastructure.

172k – 238kChicago, IL +3ML EngineeringHybridSQLAWS