Evaluation Engineer

Builds fast, reliable auto-evaluation infrastructure for AI research platform focused on pharma decision-making support. Owns backend systems, ML eval interfaces/dashboards, statistical reliability; requires 3+ years backend experience.

140k – 200kOakland, CABackend EngineeringRemote3+ YOE

Apply

About the role

Responsibilities

Core auto-eval platform

Build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals:
- Speed: Build lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency; solve fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs).
- Interfaces: ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill into. Product managers need dashboards showing performance over time and what's going wrong in production.
- Architecture: Ensure code is well-architected so other team members and ML engineers can understand and build on it.

Ensuring evaluations are accurate and reliable

Evaluate how well Elicit helps with decision-making in pharma, encoding real knowledge about pharma customer decisions (e.g., choosing appropriate gold standards).
Provide appropriate statistical tests and confidence intervals.

Time allocation

60% on core eval platform.
15% working with evals team to build and improve specific evals.
10% mentoring evals engineering intern.
Rest on learning user interactions and understanding user needs.

Requirements

At least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines).
Aptitude and interest in evaluating how Elicit helps with pharma decision-making.

Nice-to-haves

Knowledge of statistics (e.g., calculating power and confidence intervals for evals).
Experience with advanced Python (asyncio/trio and parallel processing strategies).
Front-end experience and strong UX sensibility (building dashboards). TypeScript experience is a plus.
Experience building developer tools.
Previous experience as a data engineer or working on AI infrastructure.
Knowledge of pharma/biomed.
Experience evaluating ML systems.
Experience building language-model-based systems.

Compensation

Career (L3): $140-170k + equity.
Senior (L4): $165-200k + equity.

Skills

PythonAsyncioTypeScriptStatisticsData PipelinesAI InfrastructureMl EvaluationLanguage ModelsDeveloper ToolsDashboards

Similar roles

Backend Engineering jobs

Crusoe

Software Engineer II, Managed Platform Services

Design, build, and scale Crusoe Cloud's customer-facing platforms and managed services. Focus on foundational infrastructure, scalable design, and operational excellence with 3-5 years of experience in Go, Rust, Java, or C++.

140k – 165kSan Francisco, CABackend EngineeringOn-site3+ YOEGoC++

Imprint

Software Engineer, Backend

Backend engineer owning financial systems including transaction authorization, credit decisioning, and payment processing. Requires 5+ years building production backend systems, Go proficiency, distributed systems experience, and AI tool adoption.

140k – 200kNew York, NY +1Backend EngineeringHybrid5+ YOEGoAWS

Glean

Software Engineer, Storage

Glean is seeking a Backend/Infrastructure Engineer to build and evolve the Storage layer, owning the storage systems that handle sensitive data and enable organizations to discover and act on critical knowledge. This role involves impactful infrastructure problems from architecture to launch, writing high-quality code, and collaborating with teams.

140k – 265kMountain View, CABackend EngineeringHybrid5+ YOEGoC++

Sleeper

Software Engineer, Backend

Builds and owns scalable backend systems for fantasy sports platform, handling real-time contests, leaderboards, and millions of users. Requires 3+ years experience with backend languages, distributed systems, databases, queues, and cloud infrastructure.

140k – 185kSan Francisco, CABackend EngineeringHybrid3+ YOEGoSQL

Ekho

Integrations Engineer

Build reusable integration frameworks for vehicle sales platform partners (lenders, insurers, DMVs, OEMs) handling modern APIs to legacy systems. Leverage AI for automation while working cautiously with regulated entities; onsite 4 days/week in NYC.

140k – 170kNew York, NYBackend EngineeringOn-siteAPIsAI Agents