Research Engineer – Evals
Builds evaluation systems to measure Firecrawl's web data extraction quality across diverse websites and workflows. Designs metrics, pipelines, benchmarks, and LLM judges; integrates into CI/CD and model training loops. Requires 3+ years in ML engineering or data quality with production systems.