Skip to content

ML Infrastructure Engineer

160k – 200kSan Francisco, CAOnsite3+ YOE
Summary

Builds and maintains ML infrastructure for training pipelines handling massive 3D data and real-time inference serving integrated with CAD software. Requires 3+ years experience with Python, PyTorch, ML orchestration tools, data versioning, and inference optimization.

About the role

Responsibilities

  • Design and build a centralized system for versioning training data, generated datasets, and model artifacts, with full lineage tracking from raw source data through to trained model outputs.
  • Develop and maintain reliable, reproducible ML training and data generation pipelines.
  • Refactor and harden existing training and data generation scripts into composable, testable, and maintainable components.
  • Create CI/CD workflows for validating data pipelines and model training runs, including automated correctness checks and regression detection.
  • Build tooling that enables ML engineers to launch, monitor, and debug training jobs with minimal friction.
  • Optimize and scale real-time model inference services to meet latency and throughput requirements in production, including profiling, batching strategies, and resource-efficient serving.
  • Own the deployment path from trained model artifact to production endpoint, ensuring reliable rollouts, rollback, and monitoring.

Requirements

  • 3+ years of work experience in relevant fields.
  • Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.
  • Strong communication skills and the ability to work closely with ML researchers and engineers to understand their workflows and translate them into robust systems.
  • Experience designing and building data versioning, artifact management, or dataset lineage systems (e.g., DVC, LakeFS, Weights & Biases, or custom solutions).
  • Hands-on experience with ML pipeline orchestration tools (e.g., Airflow, Prefect, Metaflow, or similar).
  • Experience with model serving and inference optimization — profiling latency, reducing memory footprint, or scaling serving infrastructure to meet real-time constraints.
  • Ability to read and refactor ML training code — you don't need to design model architectures, but you need to understand what training pipelines are doing well enough to make them reliable.
  • Proficient with Python, PyTorch.

Bonus Qualifications

  • Familiarity with AWS infrastructure services.
  • Experience with containerized ML workflows and GPU-accelerated training environments.
  • Experience with model optimization techniques (e.g., quantization, TensorRT, ONNX Runtime, distillation).
  • Knowledge of infrastructure-as-code tools (e.g., AWS CDK, Terraform).
  • Experience building or operating ML systems that handle large unstructured datasets (imagery, 3D data, sensor data).
Skills
PythonPyTorchAirflowPrefectMetaflowDVCLakeFSWeights & BiasesAWSTensorRTONNX RuntimeAWS CDKTerraform
Similar roles at this salary range
All ML Engineering jobs →
Mozilla

Senior Machine Learning Engineer

Senior ML Engineer focused on fine-tuning and deploying LLMs and generative AI features into Firefox, emphasizing privacy, latency, and user experience.

139k – 218kUnited StatesML EngineeringRemote4+ YOERayLangChain
Ironclad

Senior Software Engineer, AI

Lead design and delivery of high-priority AI initiatives across multiple codebases. Build and ship AI-powered features with strong backend fundamentals and product sense.

180k – 220kSan Francisco, CAML EngineeringHybrid5+ YOEReactEvals
Mercury

Senior Machine Learning Operations Engineer

Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.

167k – 208kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLSHAP
Distyl AI

AI Engineer, Evaluation

Design and implement evaluation frameworks and pipelines for AI systems using Evaluation-Driven Development. Build Python-based test suites, LLM graders, and measurement systems that guide prompt iteration and production deployment decisions.

150k – 250kSan Francisco, CA +1ML EngineeringHybrid2+ YOEPythonAI Systems
Grafana Labs

Senior AI Engineer

Senior Engineer building multi-agent AI systems, LLM integrations, and backend automation services that power Marketing Operations. Owns technical direction for agentic infrastructure connecting models to business systems.

154k – 185kUnited StatesML EngineeringRemote8+ YOERAGGit