Skip to content

Research Engineer — Reinforcement Learning

Builds training infrastructure, reward pipelines, and fine-tuning systems for RL-enhanced LLMs focused on web data extraction. Bridges classical RL and modern LLM agents, ships production models, runs fast experiments. Requires 3+ years in applied RL/ML engineering.

180k – 290kSan Francisco, CAML EngineeringRemote3+ YOE

About the role

What You'll Do

  • Build training infrastructure and reward pipelines from scratch.
  • Design and operate the systems that train and evaluate Firecrawl's models. Own the full loop — data collection, reward modeling, training runs, evaluation, and deployment.
  • Fine-tune models to achieve state-of-the-art results on web data extraction, content understanding, and structured output generation.
  • Bridge LLM agents and classical RL: design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows.
  • Run fast experiments and iterate quickly.
  • Communicate clearly to non-RL people.
  • Collaborate closely with the team.

What We're Looking For

  • Builds their own training infra and reward pipelines: operated GPU clusters, managed training runs, debugged convergence issues in production.
  • Can fine-tune models to SOTA: full fine-tuning lifecycle, data curation, training dynamics, hyperparameter sensitivity, evaluation methodology.
  • Bridges LLM agents and classical RL: fluent in PPO, RLHF, reward modeling, policy optimization, and LLM agents.
  • Production-minded: deployed models serving real traffic, tradeoffs between quality, latency, and cost.
  • Runs fast experiments and communicates clearly.

Backgrounds that tend to do well: RL engineers at AI labs or applied ML teams who've shipped models to production; researchers who've done RLHF or reward modeling for LLM systems; ML engineers who've built training infrastructure at startups.

Compensation & Benefits

Salary: $180,000–$290,000/year (U.S.-based in San Francisco, CA; adjusted for other locations). Equity: Up to 0.15%. Other: Generous PTO, parental leave, wellness stipend, learning & development, team offsites, sabbatical, full medical/dental/vision (US), 401(k), etc.

Skills

Reinforcement LearningRLHFPpoLlm AgentsFine-TuningGpu ClustersReward ModelingPolicy OptimizationTraining InfrastructureData Pipelines

Similar roles

ML Engineering jobs

Software Engineer - BIS

As a Software Engineer on the Inference Stack team, you will build the distributed runtime that powers large-scale LLM inference. This role involves working across the stack, from developer experience to low-level infrastructure, and owning systems in production.

180k – 360kSan Francisco, CAML EngineeringHybridvLLMCI/CD

Forward Deployed Machine Learning Engineer

Deploy and optimize FLUX diffusion models for enterprise customers, architecting custom integrations and fine-tuning solutions across production environments. Requires hands-on generative AI deployment experience and strong Python skills.

180k – 300kSan Francisco, CAML EngineeringHybrid3+ YOEPythonComfyui

AI Engineer (Core)

Builds core infrastructure for production AI agents including runtime, evaluation systems, retrieval, tool orchestration, observability, and reliability features for high-stakes real estate workflows. Requires strong systems engineering with Python, backend, and LLM experience.

180k – 250kSan Francisco, CAML EngineeringOn-siteRAGPython

Research Engineer

Develops performance optimizations for ML models across graph, kernel, and system levels using PyTorch and Thunder compiler. Builds tools, collaborates with partners, and contributes to open-source while requiring strong PyTorch expertise and optimization experience.

180k – 250kNew York, NY +1ML EngineeringRemoteCUDAGpus

Applied AI Engineer

Designs and ships production AI systems including agentic workflows, RAG pipelines, and LLM integrations for an AI-native ERP platform serving finance teams. Requires 3+ years backend experience and 2+ years production AI with Python proficiency.

180k – 240kSan Francisco, CA +1ML EngineeringHybrid3+ YOERAGLLMs