What You'll Do
- Build training infrastructure and reward pipelines from scratch.
- Design and operate the systems that train and evaluate Firecrawl's models. Own the full loop — data collection, reward modeling, training runs, evaluation, and deployment.
- Fine-tune models to achieve state-of-the-art results on web data extraction, content understanding, and structured output generation.
- Bridge LLM agents and classical RL: design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows.
- Run fast experiments and iterate quickly.
- Communicate clearly to non-RL people.
- Collaborate closely with the team.
What We're Looking For
- Builds their own training infra and reward pipelines: operated GPU clusters, managed training runs, debugged convergence issues in production.
- Can fine-tune models to SOTA: full fine-tuning lifecycle, data curation, training dynamics, hyperparameter sensitivity, evaluation methodology.
- Bridges LLM agents and classical RL: fluent in PPO, RLHF, reward modeling, policy optimization, and LLM agents.
- Production-minded: deployed models serving real traffic, tradeoffs between quality, latency, and cost.
- Runs fast experiments and communicates clearly.
Backgrounds that tend to do well: RL engineers at AI labs or applied ML teams who've shipped models to production; researchers who've done RLHF or reward modeling for LLM systems; ML engineers who've built training infrastructure at startups.
Compensation & Benefits
Salary: $180,000–$290,000/year (U.S.-based in San Francisco, CA; adjusted for other locations).
Equity: Up to 0.15%.
Other: Generous PTO, parental leave, wellness stipend, learning & development, team offsites, sabbatical, full medical/dental/vision (US), 401(k), etc.