# Evaluation Engineer
**Company:** [Elicit](https://hotfix.jobs/companies/elicit)
**Location:** Remote
**Salary:** $140K-$200K
**Experience:** 3+ years
**Skills:** Python, Asyncio, TypeScript, Statistics, Data Pipelines, AI Infrastructure, Ml Evaluation, Language Models, Developer Tools, Dashboards
**Posted:** 2026-01-22
> Builds fast, reliable auto-evaluation infrastructure for AI research platform focused on pharma decision-making support. Owns backend systems, ML eval interfaces/dashboards, statistical reliability; requires 3+ years backend experience.
## Job Description
## Responsibilities

**Core auto-eval platform**
- Build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals:
  - Speed: Build lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency; solve fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs).
  - Interfaces: ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill into. Product managers need dashboards showing performance over time and what's going wrong in production.
  - Architecture: Ensure code is well-architected so other team members and ML engineers can understand and build on it.

**Ensuring evaluations are accurate and reliable**
- Evaluate how well Elicit helps with decision-making in pharma, encoding real knowledge about pharma customer decisions (e.g., choosing appropriate gold standards).
- Provide appropriate statistical tests and confidence intervals.

**Time allocation**
- 60% on core eval platform.
- 15% working with evals team to build and improve specific evals.
- 10% mentoring evals engineering intern.
- Rest on learning user interactions and understanding user needs.

## Requirements
- At least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines).
- Aptitude and interest in evaluating how Elicit helps with pharma decision-making.

## Nice-to-haves
- Knowledge of statistics (e.g., calculating power and confidence intervals for evals).
- Experience with advanced Python (asyncio/trio and parallel processing strategies).
- Front-end experience and strong UX sensibility (building dashboards). TypeScript experience is a plus.
- Experience building developer tools.
- Previous experience as a data engineer or working on AI infrastructure.
- Knowledge of pharma/biomed.
- Experience evaluating ML systems.
- Experience building language-model-based systems.

## Compensation
- Career (L3): $140-170k + equity.
- Senior (L4): $165-200k + equity.
**Apply:** https://hotfix.jobs/jobs/evaluation-engineer-at-elicit-6adb5c39-7295-461e-83e4-305043e54d21
**Canonical:** https://hotfix.jobs/jobs/evaluation-engineer-at-elicit-6adb5c39-7295-461e-83e4-305043e54d21