Skip to content

Research Engineer, Post-Training

Research engineer focused on post-training LLMs and agents for legal work. Requires hands-on experience training open-weight models and strong Python/research engineering skills.

231k – 340kSan Francisco, CAML EngineeringHybrid

About the role

What You'll Do

  • Drive post-training experiments, pushing agent performance while navigating the Pareto frontier of cost, latency, security, and governance.
  • Optimize agent harnesses, including domain-specific skills, tools, subagents, retrieval strategies, and validation loops that improve quality on long-horizon legal work.
  • Design and develop grading and reward systems that are reliable enough for evaluation, efficient enough for iteration, and strict enough for high-stakes legal work.
  • Study agent behavior, identifying patterns that correlate with successful work product, and converting those findings into training data, evals, or harness changes.
  • Work with Harvey researchers and external research partners to define experiments, evaluate methodology, review results, and keep projects moving toward concrete model improvements.

What You Have

  • Hands-on experience with post-training or model-training work, such as SFT, preference optimization, RLHF/RLAIF, reward modeling, distillation, or adapting open-weight models to specialized domains.
  • Strong judgment about model behavior: you can read traces, inspect outputs, identify failure modes, and reason about whether a metric is measuring the thing that matters.
  • Strong Python and research-engineering ability. You can write clean code, debug experiments, and build the simple but reliable systems needed to make research move faster.
  • Ability to self-manage ambiguous applied research projects and communicate clearly with researchers, engineers, product teams, domain experts, and external partners.

Nice to Have

  • Experience building data or evaluation infrastructure for ML workflows, such as dataset curation pipelines, model-output processing, experiment tracking, evaluation dashboards, or regression analysis tooling.
  • Experience with distributed training, inference systems, GPU workloads, or large-scale ML experimentation.
  • Research publications, open-source contributions, or shipped industry work in LLMs, agents, evaluation, or ML systems.

Skills

PythonSftRLHFRlaifPreference OptimizationReward ModelingDistillationLLMsAgentsModel TrainingEvaluationMl Systems

Similar roles

ML Engineering jobs

Technical Lead Manager - Perception, Self-Driving Systems

Leads team developing and deploying a unified camera-first perception model for self-driving systems across diverse vehicles, geographies, and conditions. Hands-on with ML architecture, training, evaluation, embedded optimization, and customer requirements. Requires 5+ years ML perception experience and 2+ years team leadership.

232k – 298kSunnyvale, CAML EngineeringOn-site5+ YOEC++Bev

AI Systems Engineer, Codex Agents

Builds core agent harness for Codex AI agents, enabling safe tool use, code execution, and long-horizon tasks in production. Designs systems for sandboxing, evaluation, observability, and performance optimization across ML workflows and infrastructure.

230k – 385kSan Francisco, CAML EngineeringOn-siteRustLLMs

Ads Conversion Modeling, Machine Learning Engineering Manager

Leads machine learning team developing conversion models for Reddit Ads, focusing on predictive modeling for user actions like purchases and signups. Requires deep ML expertise, ads domain knowledge, and 2+ years managing high-performing ML teams.

230k – 322kUnited StatesML EngineeringRemotePyTorchTensorFlow

Applied AI Engineer, Codex Core Agent

Develops and improves Codex AI agents for real-world software engineering tasks, focusing on performance, reliability, and integration with research and product teams. Requires strong Python, ML/LLM experience, and skills in evaluation, prompting, and debugging production failures.

230k – 325kSan Francisco, CA +2ML EngineeringOn-siteLLMsPython

Software Engineer, Marketing Innovation

Build and own autonomous agentic systems for customer-facing revenue and marketing workflows, partnering with sales and marketing teams. Requires 4+ years experience in software/ML engineering, full-stack skills in Python/JavaScript, and production systems expertise.

230k – 385kSan Francisco, CAML EngineeringOn-site4+ YOEAPIsPython