Research Engineer — Reinforcement Learning

Builds training infrastructure, reward pipelines, and fine-tuning systems for RL-enhanced LLMs focused on web data extraction. Bridges classical RL and modern LLM agents, ships production models, runs fast experiments. Requires 3+ years in applied RL/ML engineering.

180k – 290kSan Francisco, CAML EngineeringRemote3+ YOE

Apply

About the role

What You'll Do

Build training infrastructure and reward pipelines from scratch.
Design and operate the systems that train and evaluate Firecrawl's models. Own the full loop — data collection, reward modeling, training runs, evaluation, and deployment.
Fine-tune models to achieve state-of-the-art results on web data extraction, content understanding, and structured output generation.
Bridge LLM agents and classical RL: design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows.
Run fast experiments and iterate quickly.
Communicate clearly to non-RL people.
Collaborate closely with the team.

What We're Looking For

Builds their own training infra and reward pipelines: operated GPU clusters, managed training runs, debugged convergence issues in production.
Can fine-tune models to SOTA: full fine-tuning lifecycle, data curation, training dynamics, hyperparameter sensitivity, evaluation methodology.
Bridges LLM agents and classical RL: fluent in PPO, RLHF, reward modeling, policy optimization, and LLM agents.
Production-minded: deployed models serving real traffic, tradeoffs between quality, latency, and cost.
Runs fast experiments and communicates clearly.

Backgrounds that tend to do well: RL engineers at AI labs or applied ML teams who've shipped models to production; researchers who've done RLHF or reward modeling for LLM systems; ML engineers who've built training infrastructure at startups.

Compensation & Benefits

Salary: $180,000–$290,000/year (U.S.-based in San Francisco, CA; adjusted for other locations). Equity: Up to 0.15%. Other: Generous PTO, parental leave, wellness stipend, learning & development, team offsites, sabbatical, full medical/dental/vision (US), 401(k), etc.

Skills

Reinforcement LearningRLHFPpoLlm AgentsFine-TuningGpu ClustersReward ModelingPolicy OptimizationTraining InfrastructureData Pipelines

Similar roles

ML Engineering jobs

Baseten

Software Engineer - BIS

As a Software Engineer on the Inference Stack team, you will build the distributed runtime that powers large-scale LLM inference. This role involves working across the stack, from developer experience to low-level infrastructure, and owning systems in production.

180k – 360kSan Francisco, CAML EngineeringHybridvLLMCI/CD

Black Forest Labs

Forward Deployed Machine Learning Engineer

Deploy and optimize FLUX diffusion models for enterprise customers, architecting custom integrations and fine-tuning solutions across production environments. Requires hands-on generative AI deployment experience and strong Python skills.

180k – 300kSan Francisco, CAML EngineeringHybrid3+ YOEPythonComfyui

Build

AI Engineer (Core)

Builds core infrastructure for production AI agents including runtime, evaluation systems, retrieval, tool orchestration, observability, and reliability features for high-stakes real estate workflows. Requires strong systems engineering with Python, backend, and LLM experience.

180k – 250kSan Francisco, CAML EngineeringOn-siteRAGPython

Lightning AI

Research Engineer

Develops performance optimizations for ML models across graph, kernel, and system levels using PyTorch and Thunder compiler. Builds tools, collaborates with partners, and contributes to open-source while requiring strong PyTorch expertise and optimization experience.

180k – 250kNew York, NY +1ML EngineeringRemoteCUDAGpus

Rillet

Applied AI Engineer

Designs and ships production AI systems including agentic workflows, RAG pipelines, and LLM integrations for an AI-native ERP platform serving finance teams. Requires 3+ years backend experience and 2+ years production AI with Python proficiency.

180k – 240kSan Francisco, CA +1ML EngineeringHybrid3+ YOERAGLLMs