Skip to content

Agent Post-Training, Frontier Evals and Environments Research

Researcher building frontier RL environments, evaluations, and training signals to steer OpenAI's largest agent training runs and measure model capabilities.

295k – 445kSan Francisco, CAAI ResearchOnsite7+ YOE

About the role

Responsibilities

  • Create ambitious RL environments to push frontier models to their limits and measure model capabilities, skills, and behaviors
  • Develop new methodologies for automatically exploring model behavior
  • Dive deep into the science of measurement, including scalability, reliability, and variance of evaluation methodology
  • Help steer training for the largest training runs
  • Design scalable systems and processes to support continuous evaluation
  • Build self-improvement loops to automate model understanding

Requirements

  • Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field
  • Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems
  • Ability to move from a vague behavioral problem to a concrete experiment: define the hypothesis, build the pipeline, run the model, analyze the result, and decide next steps
  • Comfortable working across research, product, infrastructure, data, evals, and safety boundaries

Nice-to-Haves

  • Excitement for open-ended problems where the path is unclear and the signal is noisy
  • Care about product impact and model behavior beyond benchmark movement
  • Opinions about what makes an agent useful, reliable, honest, tasteful, and easy to work with
  • Willingness to build load-bearing systems and processes even when the work is not glamorous

Skills

Machine LearningSoftware EngineeringStatisticsLLMsReinforcement LearningRLHFRlaifPost-TrainingEvaluationsGradersSynthetic DataModel TrainingCoding AgentsTool-Using AgentsProduction Ml Systems

Similar roles

AI Research jobs

Research Engineer, Discovery

Builds large-scale infrastructure for AI scientist training, evaluation, and deployment, resolving bottlenecks in distributed systems for scientific AGI. Requires 6+ years in infrastructure engineering with expertise in ML stacks, containers, and data pipelines.

350k – 850kSan Francisco, CAAI ResearchHybrid6+ YOEJAXAWS

Applied Research Scientist / Engineer

Work as a fullstack applied researcher adapting multimodal video foundation models for production. Focus on controllability, personalization, and end-user quality using SFT, RL, and data-driven refinement.

200k – 450kNew York, NY +1AI ResearchHybrid7+ YOERlSft

Senior Research Engineer, Voice + Speech

Lead development of models and algorithms for real-time voice agents, advancing speech understanding, naturalness, and production deployment in conversational AI. Requires 5+ years in AI/ML with experience deploying LLMs.

200k – 400kNew York, NYAI ResearchOn-site5+ YOELLMsPython

Senior Research Scientist

Leads end-to-end research initiatives in machine learning and large language models for conversational AI in housing and healthcare. Requires PhD plus 5+ years post-PhD experience, strong ML expertise, and Python proficiency.

200k – 320kSan Francisco, CAAI ResearchOn-site5+ YOERLLMs

Senior Research Scientist

Leads end-to-end research initiatives in machine learning and large language models for conversational AI in housing and healthcare. Requires PhD in relevant field plus 5+ years post-PhD experience, strong ML expertise, and Python proficiency.

200k – 320kNew York, NYAI ResearchOn-site5+ YOERLLMs