Machine Learning Eval Engineer

150k – 300kSan Francisco, CAML EngineeringOnsiteMar 16

Summary

Build and maintain ML evaluation benchmarks and metrics to identify model weaknesses on unstructured enterprise data like PDFs and spreadsheets. Collaborate with ML teams to drive improvements using Python tools and data infrastructure.

About the role

What You’ll Do

Design, build, and maintain evaluation benchmarks that reveal where our models perform well and where they fail.
Develop metrics, heuristics, and workflows to automatically identify new failure modes across large and messy real-world datasets.
Partner closely with other ML engineers to turn evaluation insights into model improvements and better training priorities.
Work hands-on with unstructured enterprise data, including PDFs, spreadsheets, and other difficult document formats, to uncover edge cases and hard examples.
Build lightweight internal and user-facing tools, including simple interfaces in Python frameworks like Flask, to help teams inspect results, analyze model behavior, and communicate evaluation outcomes.
Collaborate with customers and internal teams to understand real-world data needs and create bespoke benchmarks that highlight Reducto’s strengths.

You’ll Thrive Here If You

Hold yourself to a high bar for quality and precision.
Enjoy solving complex problems and building from first principles.
Have strong Python skills and can independently build clean, reliable technical solutions. Bonus points for product and frontend experience!
Are comfortable working with data infrastructure such as AWS S3 and OLAP or analytics systems like Tinybird.
Love getting your hands dirty with unstructured data and chasing down difficult failure cases.
Operate well in fast-changing, high-growth environments.
Collaborate effectively across technical and non-technical teams.
Take full ownership from strategy through execution.

Bonus points if you

Have experience at an early-stage or high-growth startup.
Have some background in product thinking and can build simple, polished user-facing interfaces.
Are comfortable working directly with customers to understand their workflows and data needs.
Have experience in AI/ML, data infrastructure, enterprise software, or document understanding systems.
Care deeply about combining technical excellence with business impact.

Skills

PythonAWS S3FlaskTinybirdML evaluationbenchmarksmetricsheuristicsunstructured dataPDFsspreadsheetsOLAP

Similar roles at this salary range

All ML Engineering jobs →

Zoox

Jun 24

Machine Learning Engineer - Simulation Framework

Machine Learning Engineer focused on GPU-based simulation frameworks, reinforcement learning, and bridging sim-to-real gaps for autonomous vehicle safety validation. Requires MS/PhD and strong C++/Python experience.

151k – 257kFoster City, CA +1ML EngineeringHybrid7+ YOEJAXC++

Talkiatry

Jun 24

Senior AI Engineer

Build full-stack AI systems including agentic workflows, RAG pipelines, and production infrastructure for mental healthcare applications. Requires 2+ years software engineering experience and 1+ year with LLMs or agentic AI.

170k – 195kUnited StatesML EngineeringRemote2+ YOERAGReact

Grafana Labs

Jun 24

Staff AI Engineer

Staff AI Engineer building and shipping LLM/agent-powered observability features for incident detection, triage, and resolution. Requires strong production software engineering experience plus practical GenAI/LLM application skills.

175k – 220kUnited StatesML EngineeringRemote7+ YOEAWSGCP

Grafana Labs

Jun 24

Senior AI Engineer

Build and ship AI-powered observability features using LLMs and agent workflows to help users detect, triage, and resolve incidents. Requires strong production software engineering experience plus practical GenAI application skills.

128k – 204kUnited StatesML EngineeringRemote5+ YOEAWSGCP

Jun 23

Staff Software Engineer, Trends Machine Learning Infrastructure

Lead technical direction for Pinterest's unified AI-powered Trends and Audience Insights platform. Architect scalable ML data pipelines and LLM capabilities while mentoring engineers and driving cross-team integrations.

177k – 365kSan Francisco, CAML EngineeringHybrid8+ YOELLMsCodex

Apply