ML Infrastructure Engineer

160k – 200kSan Francisco, CAOnsite3+ YOEApr 25

Summary

Builds and maintains ML infrastructure for training pipelines handling massive 3D data and real-time inference serving integrated with CAD software. Requires 3+ years experience with Python, PyTorch, ML orchestration tools, data versioning, and inference optimization.

About the role

Responsibilities

Design and build a centralized system for versioning training data, generated datasets, and model artifacts, with full lineage tracking from raw source data through to trained model outputs.
Develop and maintain reliable, reproducible ML training and data generation pipelines.
Refactor and harden existing training and data generation scripts into composable, testable, and maintainable components.
Create CI/CD workflows for validating data pipelines and model training runs, including automated correctness checks and regression detection.
Build tooling that enables ML engineers to launch, monitor, and debug training jobs with minimal friction.
Optimize and scale real-time model inference services to meet latency and throughput requirements in production, including profiling, batching strategies, and resource-efficient serving.
Own the deployment path from trained model artifact to production endpoint, ensuring reliable rollouts, rollback, and monitoring.

Requirements

3+ years of work experience in relevant fields.
Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.
Strong communication skills and the ability to work closely with ML researchers and engineers to understand their workflows and translate them into robust systems.
Experience designing and building data versioning, artifact management, or dataset lineage systems (e.g., DVC, LakeFS, Weights & Biases, or custom solutions).
Hands-on experience with ML pipeline orchestration tools (e.g., Airflow, Prefect, Metaflow, or similar).
Experience with model serving and inference optimization — profiling latency, reducing memory footprint, or scaling serving infrastructure to meet real-time constraints.
Ability to read and refactor ML training code — you don't need to design model architectures, but you need to understand what training pipelines are doing well enough to make them reliable.
Proficient with Python, PyTorch.

Bonus Qualifications

Familiarity with AWS infrastructure services.
Experience with containerized ML workflows and GPU-accelerated training environments.
Experience with model optimization techniques (e.g., quantization, TensorRT, ONNX Runtime, distillation).
Knowledge of infrastructure-as-code tools (e.g., AWS CDK, Terraform).
Experience building or operating ML systems that handle large unstructured datasets (imagery, 3D data, sensor data).

Skills

PythonPyTorchAirflowPrefectMetaflowDVCLakeFSWeights & BiasesAWSTensorRTONNX RuntimeAWS CDKTerraform

Similar roles at this salary range

All ML Engineering jobs →

Mozilla

Jun 19

Senior Machine Learning Engineer

Senior ML Engineer focused on fine-tuning and deploying LLMs and generative AI features into Firefox, emphasizing privacy, latency, and user experience.

139k – 218kUnited StatesML EngineeringRemote4+ YOERayLangChain

Ironclad

Jun 18

Senior Software Engineer, AI

Lead design and delivery of high-priority AI initiatives across multiple codebases. Build and ship AI-powered features with strong backend fundamentals and product sense.

180k – 220kSan Francisco, CAML EngineeringHybrid5+ YOEReactEvals

Mercury

Jun 18

Senior Machine Learning Operations Engineer

Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.

167k – 208kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLSHAP

Distyl AI

Jun 18

AI Engineer, Evaluation

Design and implement evaluation frameworks and pipelines for AI systems using Evaluation-Driven Development. Build Python-based test suites, LLM graders, and measurement systems that guide prompt iteration and production deployment decisions.

150k – 250kSan Francisco, CA +1ML EngineeringHybrid2+ YOEPythonAI Systems

Grafana Labs

Jun 18

Senior AI Engineer

Senior Engineer building multi-agent AI systems, LLM integrations, and backend automation services that power Marketing Operations. Owns technical direction for agentic infrastructure connecting models to business systems.

154k – 185kUnited StatesML EngineeringRemote8+ YOERAGGit

Apply