Staff ML Engineer

205k – 330kPalo Alto, CASeattle, WARemote8+ YOEJun 12

Summary

Founding Staff ML Engineer building production ML systems for governance, security, and agentic platform capabilities at Docker. Owns architecture, data pipelines, evaluation, and model lifecycle while mentoring the growing team.

About the role

Responsibilities

Design, train, evaluate, and ship ML systems that power governance and security capabilities, starting with problems like prompt injection detection, behavioral anomaly detection, trust scoring, and policy recommendations.
Build the supporting infrastructure: data pipelines, feature stores, model serving, evaluation harnesses, and the feedback loops that make iteration fast.
Make pragmatic build-vs-buy calls. Use frontier models, off-the-shelf tooling, and managed services to move quickly; invest in custom systems where they create durable advantage.
Set technical direction for the team's ML work. Own the architecture, evaluation methodology, model lifecycle, and the bar for shipping.
Help recruit, mentor, and shape the team as it grows.
Participate in a 24/7 on-call rotation for the Agentic Platform.

Requirements

5+ years of deep applied ML/AI expertise with a track record of shipping production systems.
Experience in fraud, abuse, safety, security, or trust domains, where adversarial dynamics, imbalanced data, and high-stakes decisions is valuable.
8+ years of professional, hands-on, full-time software engineering experience in backend, infrastructure, or platform engineering.
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
You've built and owned the systems around ML models, i.e. data pipelines, serving, evaluation, monitoring etc. and have shipped customer-facing products end to end.
You use modern AI tools fluently in your day-to-day work and have a sharp instinct for when frontier models can replace traditional ML, when they can't, and when to combine the two.
Experience with LLM-based systems in production - evaluation, prompt engineering, fine-tuning, retrieval, guardrails, agent frameworks.
Familiarity with the agent / MCP ecosystem.
Energized by an early-stage effort where the roadmap is being written as the work happens, and you make crisp decisions with incomplete information.
Collaborative and low-ego. You work well across teams, write clearly, and bring others along.

Skills

Machine LearningLLMsPrompt EngineeringFine-tuningRetrievalGuardrailsAgent FrameworksData PipelinesModel ServingEvaluationBackend EngineeringInfrastructure

Similar roles at this salary range

All ML Engineering jobs →

Coinbase

Jun 12

Staff Machine Learning Engineer

Staff ML Engineer leading end-to-end identity verification ML systems including document authenticity, face matching, liveness detection, GNN-based identity graphs, and behavioral risk models. Requires 8+ years production ML experience and domain expertise in biometrics or fraud detection.

218k – 257kUnited StatesML EngineeringRemote8+ YOENLPLLMs

Notable

Jun 12

AI Platform Engineer

Design, build, and maintain LLM integrations powering AI features. Own end-to-end delivery from requirements through production monitoring with focus on scalability and reliability.

170k – 205kSan Mateo, CAML EngineeringHybrid5+ YOEGKEHelm

Hinge Health

Jun 12

Staff Machine Learning Scientist

Own ML systems for send-time optimization, propensity modeling, and nudge decisions at consumer scale. Set experimentation standards and mentor a small ML team.

205k – 307kSan Francisco, CAML EngineeringHybrid7+ YOESQLdbt

Jun 12

Principal Engineer, AI Platform

Principal Engineer setting technical vision and building AI/ML infrastructure for Generative AI and Recommender Systems at Pinterest, scaling to hundreds of millions of inferences per second. Requires deep expertise in distributed systems and proven cross-org technical leadership.

243k – 500kSan Francisco, CAML EngineeringHybrid7+ YOEC++Java

Jun 12

Senior Research Engineer, Post-training & Evaluation

Own evaluation science and post-training methodology for Reddit's foundational LLMs. Define benchmarks, design model-as-a-judge systems, and set SFT recipes that turn base models into safe, Reddit-native endpoints.

230k – 322kUnited StatesML EngineeringRemote6+ YOESFTCPT

Apply