# Research Engineer - Environments, Data and Post-Training
**Company:** [Mercor](https://hotfix.jobs/companies/mercor)
**Location:** San Francisco, CA
**Salary:** $130K-$500K
**Skills:** PyTorch, Machine Learning, LLMs, RLHF, Rlvr, Synthetic Data, Post-Training, Evaluation Frameworks, SQL, NoSQL, APIs, Cloud Platforms, Data Structures, Algorithms, Backend Systems
**Posted:** 2026-03-25
> Develops post-training pipelines, RLVR experiments, synthetic data generation, and large-scale LLM evaluation systems to enhance frontier language model performance in tool use, agentic behavior, and reasoning. Requires strong ML experience, coding skills, and research background.
## Job Description
## Responsibilities
- Work on post-training and RLVR pipelines to understand how datasets, rewards, and training strategies impact model performance.
- Design and run reward-shaping experiments and algorithmic improvements (e.g., GRPO, DAPO) to improve LLM tool-use, agentic behavior, and real-world reasoning.
- Quantify data usability, quality, and performance uplift on key benchmarks.
- Build and maintain data generation and augmentation pipelines that scale with training needs.
- Create and refine rubrics, evaluators, and scoring frameworks that guide training and evaluation decisions.
- Build and operate LLM evaluation systems, benchmarks, and metrics at scale.
- Collaborate closely with AI researchers, applied AI teams, and experts producing training data.
- Operate in a fast-paced, experimental research environment with rapid iteration cycles and high ownership.

## Requirements
- Strong applied research background, with a focus on post-training and/or model evaluation.
- Strong coding proficiency and hands-on experience working with machine learning models.
- Strong understanding of data structures, algorithms, backend systems, and core engineering fundamentals.
- Familiarity with APIs, SQL/NoSQL databases, and cloud platforms.
- Ability to reason deeply about model behavior, experimental results, and data quality.
- Excitement to work in person in San Francisco, five days a week (with optional remote Saturdays), and thrive in a high-intensity, high-ownership environment.

## Nice To Have
- Real-world post-training team experience in industry (highest priority).
- Publications at top-tier conferences (NeurIPS, ICML, ACL).
- Experience training models or evaluating model performance.
- Experience in synthetic data generation, LLM evaluations, or RL-style workflows.
- Work samples, artifacts, or code repositories demonstrating relevant skills.

## Benefits
- Generous equity grant vested over 4 years
- $10K housing bonus (if you live within 0.5 miles of our office)
- $1.5K monthly stipend for meals
- Free Equinox membership
- Health insurance
**Apply:** https://hotfix.jobs/jobs/research-engineer-environments-data-and-post-training-at-mercor-db1ab1f0-97f8-43cc-b4aa-79e5f9abf970
**Canonical:** https://hotfix.jobs/jobs/research-engineer-environments-data-and-post-training-at-mercor-db1ab1f0-97f8-43cc-b4aa-79e5f9abf970