Technical Lead, Evaluation Infrastructure

194k – 291kMountain View, CAML EngineeringOnsite4+ YOEMay 28

Summary

Lead the Evaluation Infrastructure team building metrics, evaluation pipelines, and validation platforms for autonomous vehicle safety and iteration. Requires 4+ years experience in distributed systems and ML evaluation, plus strong Python/C++ skills and AI-native engineering practices.

About the role

Responsibilities

Build and own a unified metrics, evaluation, and validation platform — pipelines, introspection tooling, and analysis products that turn on-road and simulation logs into high-fidelity signals for autonomy iteration and driverless safety validation
Drive the technical bar for metric quality across both heuristic and ML-based approaches
Invest in the scale, reliability, and CI/CD of the evaluation stack to shorten time-to-signal for evaluation and time-to-confidence for validation, and to meet high SLAs for downstream stakeholders
Mentor and grow the Evaluation Infrastructure team, and champion AI-native engineering practices that compound team velocity and code quality
Partner with Product, Autonomy, Systems & Safety, and Simulation teams to define and execute the vision and strategy for evaluation

Requirements

B.Sc or M.Sc. degree plus 4 years of relevant work experience
Strong fluency in distributed systems, large-scale data and ML evaluation pipelines, metrics frameworks (heuristic and/or ML-based), and analytics platforms
Experience setting technical vision, roadmap, and prioritization for a team operating at the intersection of autonomy, safety, and data infrastructure
Clear, concise communicator who partners effectively with PMs, engineers, and cross-functional stakeholders
Ability and willingness to deep-dive into implementation
Sets the technical bar for metric quality, pipeline rigor, and safety-critical engineering practice
Strong proficiency in Python, C++, or similar languages
Daily user of modern AI coding assistants and agentic tools (Claude Code, Cursor, and similar), with strong intuition for where they accelerate engineering work

Nice-to-Haves

Knowledge of data engineering tooling and best practices
Knowledge of batch and streaming data processing, warehousing, and analytics solutions
Experience with data workflow orchestration platforms
Prior experience building evaluation, validation, or analytics platforms, ideally in autonomy, robotics, or safety-critical systems

Compensation & Benefits

Base pay range: $193,930 - $291,150/year
Annual performance bonus and equity
Competitive benefits package

Skills

PythonC++Distributed SystemsML Evaluation PipelinesMetrics FrameworksData EngineeringBatch ProcessingStreaming DataData Workflow OrchestrationCI/CD

Similar roles at this salary range

All ML Engineering jobs →

Mem0

Jun 19

Senior Research Engineer

Own the end-to-end lifecycle of memory features for AI agents. Fine-tune models, implement research, build evaluations, and ship production systems with Engineering.

175k – 250kSan Francisco, CAML EngineeringOn-site7+ YOERAGvLLM

Ironclad

Jun 18

Senior Software Engineer, AI

Lead design and delivery of high-priority AI initiatives across multiple codebases. Build and ship AI-powered features with strong backend fundamentals and product sense.

180k – 220kSan Francisco, CAML EngineeringHybrid5+ YOEReactEvals

Mercury

Jun 18

Senior Machine Learning Operations Engineer

Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.

167k – 208kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLSHAP

Plaid

Jun 18

Machine Learning Engineer - Embedded Insights

Drive ML initiatives from concept to production on the Embedded Insights team. Identify opportunities, build and deploy models using Plaid's financial datasets, and partner with product teams to deliver scalable customer-facing intelligence products.

212k – 272kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLMLOps

Plaid

Jun 18

Machine Learning Engineer

Advance Plaid’s foundation models by developing novel architectures, pretraining objectives, and fine-tuning strategies. Work across the full ML stack from data engineering to production serving and monitoring.

212k – 272kSan Francisco, CA +2ML EngineeringHybrid1+ YOELLMsPython

Apply