# Research Engineer, Post-Training
**Company:** [Harvey](https://hotfix.jobs/companies/harvey)
**Location:** San Francisco, CA
**Salary:** $231K-$340K
**Skills:** Python, Sft, RLHF, Rlaif, Preference Optimization, Reward Modeling, Distillation, LLMs, Agents, Model Training, Evaluation, Ml Systems
**Posted:** 2026-06-26
> Research engineer focused on post-training LLMs and agents for legal work. Requires hands-on experience training open-weight models and strong Python/research engineering skills.
## Job Description
## What You'll Do
- Drive post-training experiments, pushing agent performance while navigating the Pareto frontier of cost, latency, security, and governance.
- Optimize agent harnesses, including domain-specific skills, tools, subagents, retrieval strategies, and validation loops that improve quality on long-horizon legal work.
- Design and develop grading and reward systems that are reliable enough for evaluation, efficient enough for iteration, and strict enough for high-stakes legal work.
- Study agent behavior, identifying patterns that correlate with successful work product, and converting those findings into training data, evals, or harness changes.
- Work with Harvey researchers and external research partners to define experiments, evaluate methodology, review results, and keep projects moving toward concrete model improvements.

## What You Have
- Hands-on experience with post-training or model-training work, such as SFT, preference optimization, RLHF/RLAIF, reward modeling, distillation, or adapting open-weight models to specialized domains.
- Strong judgment about model behavior: you can read traces, inspect outputs, identify failure modes, and reason about whether a metric is measuring the thing that matters.
- Strong Python and research-engineering ability. You can write clean code, debug experiments, and build the simple but reliable systems needed to make research move faster.
- Ability to self-manage ambiguous applied research projects and communicate clearly with researchers, engineers, product teams, domain experts, and external partners.

## Nice to Have
- Experience building data or evaluation infrastructure for ML workflows, such as dataset curation pipelines, model-output processing, experiment tracking, evaluation dashboards, or regression analysis tooling.
- Experience with distributed training, inference systems, GPU workloads, or large-scale ML experimentation.
- Research publications, open-source contributions, or shipped industry work in LLMs, agents, evaluation, or ML systems.
**Apply:** https://hotfix.jobs/jobs/research-engineer-post-training-at-harvey-234eff6f-acf9-4254-98a2-b6f41d62bb8e
**Canonical:** https://hotfix.jobs/jobs/research-engineer-post-training-at-harvey-234eff6f-acf9-4254-98a2-b6f41d62bb8e