# Research Engineer – Evals
**Company:** [Firecrawl](https://hotfix.jobs/companies/firecrawl)
**Location:** San Francisco, CA
**Salary:** $160K-$240K
**Experience:** 3+ years
**Skills:** Llm Evaluation, Ml Engineering, Pipelines, CI/CD, RLHF, Benchmark Datasets, Llm-As-Judge, Data Quality, Human Review, Reward Signals
**Posted:** 2026-05-13
> Builds evaluation systems to measure Firecrawl's web data extraction quality across diverse websites and workflows. Designs metrics, pipelines, benchmarks, and LLM judges; integrates into CI/CD and model training loops. Requires 3+ years in ML engineering or data quality with production systems.
## Job Description
## What You’ll Do

- Build the eval stack from scratch. Design and own the systems that measure whether Firecrawl's outputs are actually good — across scrape, crawl, extract, and map. That means defining metrics, building pipelines, curating datasets, and integrating evals into CI/CD so regressions get caught before they ship.
- Design benchmarks that reflect reality. Build benchmark datasets that cover the real distribution of what customers send, including edge cases.
- Own LLM-as-judge pipelines. Design and validate automated judges that score extraction quality at scale, build human review tooling.
- Close the loop with models and RL. Turn quality measurements into reward signals and feedback loops.
- Run fast experiments and communicate clearly.

## What We're Looking For

- Builds their own eval infrastructure: pipelines, datasets, rubrics, judges.
- Knows what \"good\" means for unstructured web data.
- Fluent in LLM evaluation methodology: LLM-as-judge, rubrics, human review.
- Production-minded: evals reflect real production behavior.
- Fast and clear.

**Backgrounds that tend to do well:** ML engineers with eval/data quality systems, LLM fine-tuning/RLHF, data infra and model development.

**Bonus Points:** Experience at scraping/automation/security startup, ex-founder.
**Apply:** https://hotfix.jobs/jobs/research-engineer-evals-at-firecrawl-5d35303f-5667-41b7-a03c-dc6f8f99313e
**Canonical:** https://hotfix.jobs/jobs/research-engineer-evals-at-firecrawl-5d35303f-5667-41b7-a03c-dc6f8f99313e