Skip to content

Research Engineer - Environments, Data and Post-Training

Develops post-training pipelines, RLVR experiments, synthetic data generation, and large-scale LLM evaluation systems to enhance frontier language model performance in tool use, agentic behavior, and reasoning. Requires strong ML experience, coding skills, and research background.

130k – 500kSan Francisco, CAML EngineeringOnsite

About the role

Responsibilities

  • Work on post-training and RLVR pipelines to understand how datasets, rewards, and training strategies impact model performance.
  • Design and run reward-shaping experiments and algorithmic improvements (e.g., GRPO, DAPO) to improve LLM tool-use, agentic behavior, and real-world reasoning.
  • Quantify data usability, quality, and performance uplift on key benchmarks.
  • Build and maintain data generation and augmentation pipelines that scale with training needs.
  • Create and refine rubrics, evaluators, and scoring frameworks that guide training and evaluation decisions.
  • Build and operate LLM evaluation systems, benchmarks, and metrics at scale.
  • Collaborate closely with AI researchers, applied AI teams, and experts producing training data.
  • Operate in a fast-paced, experimental research environment with rapid iteration cycles and high ownership.

Requirements

  • Strong applied research background, with a focus on post-training and/or model evaluation.
  • Strong coding proficiency and hands-on experience working with machine learning models.
  • Strong understanding of data structures, algorithms, backend systems, and core engineering fundamentals.
  • Familiarity with APIs, SQL/NoSQL databases, and cloud platforms.
  • Ability to reason deeply about model behavior, experimental results, and data quality.
  • Excitement to work in person in San Francisco, five days a week (with optional remote Saturdays), and thrive in a high-intensity, high-ownership environment.

Nice To Have

  • Real-world post-training team experience in industry (highest priority).
  • Publications at top-tier conferences (NeurIPS, ICML, ACL).
  • Experience training models or evaluating model performance.
  • Experience in synthetic data generation, LLM evaluations, or RL-style workflows.
  • Work samples, artifacts, or code repositories demonstrating relevant skills.

Benefits

  • Generous equity grant vested over 4 years
  • $10K housing bonus (if you live within 0.5 miles of our office)
  • $1.5K monthly stipend for meals
  • Free Equinox membership
  • Health insurance

Skills

PyTorchMachine LearningLLMsRLHFRlvrSynthetic DataPost-TrainingEvaluation FrameworksSQLNoSQLAPIsCloud PlatformsData StructuresAlgorithmsBackend Systems

Similar roles

ML Engineering jobs

AI Product Engineer

Build agentic capabilities on a petabyte-scale observability platform. Own the full agent stack including context engineering, tool design, evals, and production reliability for incident investigation.

130k – 230kUnited StatesML EngineeringRemote5+ YOEMcpSQL

Software Engineer, AI Data & Evaluation

As a Senior Software Engineer, AI Data & Evaluation, you will build data infrastructure and evaluation systems for frontier AI models. This role involves designing evaluation methodologies, building synthetic data generation systems, and architecting operational automation.

130k – 500kSan Francisco, CAML EngineeringOn-siteSystem DesignAi/Ml Data Pipelines

Machine Learning Engineer, Marketplace

Build ML models and decision systems for search, ranking, candidate-job matching, and marketplace optimization at a fast-growing AI talent platform.

130k – 500kSan Francisco, CAML EngineeringOn-site3+ YOEGoRAG

Software Engineer, Applied AI

Builds and deploys custom integrations, APIs, and scalable solutions for enterprise clients using Mercor's AI platform. Requires strong engineering skills in modern languages and cloud environments, with customer-facing experience.

130k – 500kSan Francisco, CAML EngineeringOn-siteGoAWS

Software Engineer, Applied AI

Builds and operates scalable data pipelines and systems for post-training workflows, model evaluations, and synthetic data generation. Partners with frontier AI labs and customers, requiring strong backend skills in Python/Go/Rust and ML evaluation expertise.

130k – 500kSan Francisco, CAML EngineeringOn-siteGoRust