# Research Engineer — Reinforcement Learning
**Company:** [Firecrawl](https://hotfix.jobs/companies/firecrawl)
**Location:** Remote
**Salary:** $180K-$290K
**Experience:** 3+ years
**Skills:** Reinforcement Learning, RLHF, Ppo, Llm Agents, Fine-Tuning, Gpu Clusters, Reward Modeling, Policy Optimization, Training Infrastructure, Data Pipelines
**Posted:** 2026-03-18
> Builds training infrastructure, reward pipelines, and fine-tuning systems for RL-enhanced LLMs focused on web data extraction. Bridges classical RL and modern LLM agents, ships production models, runs fast experiments. Requires 3+ years in applied RL/ML engineering.
## Job Description
## What You'll Do

- Build training infrastructure and reward pipelines from scratch.
- Design and operate the systems that train and evaluate Firecrawl's models. Own the full loop — data collection, reward modeling, training runs, evaluation, and deployment.
- Fine-tune models to achieve state-of-the-art results on web data extraction, content understanding, and structured output generation.
- Bridge LLM agents and classical RL: design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows.
- Run fast experiments and iterate quickly.
- Communicate clearly to non-RL people.
- Collaborate closely with the team.

## What We're Looking For

- Builds their own training infra and reward pipelines: operated GPU clusters, managed training runs, debugged convergence issues in production.
- Can fine-tune models to SOTA: full fine-tuning lifecycle, data curation, training dynamics, hyperparameter sensitivity, evaluation methodology.
- Bridges LLM agents and classical RL: fluent in PPO, **RLHF**, reward modeling, policy optimization, and LLM agents.
- Production-minded: deployed models serving real traffic, tradeoffs between quality, latency, and cost.
- Runs fast experiments and communicates clearly.

**Backgrounds that tend to do well:** RL engineers at AI labs or applied ML teams who've shipped models to production; researchers who've done **RLHF** or reward modeling for LLM systems; ML engineers who've built training infrastructure at startups.

## Compensation & Benefits

**Salary:** $180,000–$290,000/year (U.S.-based in San Francisco, CA; adjusted for other locations).
**Equity:** Up to 0.15%.
**Other:** Generous PTO, parental leave, wellness stipend, learning & development, team offsites, sabbatical, full medical/dental/vision (US), 401(k), etc.
**Apply:** https://hotfix.jobs/jobs/research-engineer-reinforcement-learning-at-firecrawl-1d00f27b-8251-4810-aeda-dc71da69c5cc
**Canonical:** https://hotfix.jobs/jobs/research-engineer-reinforcement-learning-at-firecrawl-1d00f27b-8251-4810-aeda-dc71da69c5cc