Skip to content

Agent Post-Training, Personality

Help shape OpenAI agent personality by turning qualitative collaboration insights into evals, training data, reward signals, and model improvements that reach production.

295k – 445kSan Francisco, CAML EngineeringOnsite7+ YOE

About the role

Responsibilities

  • Develop a rigorous understanding of what makes an agent a great collaborator across professional, creative, technical, and everyday work.
  • Turn qualitative judgments about model behavior into concrete hypotheses, evals, graders, and training interventions.
  • Study explicit and implicit user signals to understand which behaviors create trust, satisfaction, continued use, and successful outcomes.
  • Work with human experts and trainers to produce high-quality, tasteful rollouts and preference data that capture excellent collaborative behavior.
  • Improve reward models and RL objectives for model behaviors.
  • Work with pretraining and early-training teams on data mixtures, objectives, synthetic data, and other upstream choices that shape downstream personality.
  • Build sustainable pipelines for updating older training data as our understanding of excellent model behavior evolves.
  • Partner closely with ChatGPT, Codex, and other product teams to turn consumer insight into model improvements and validate them in real workflows.
  • Own projects end to end, from observing a subtle behavioral failure through experimentation, training, evaluation, and launch.

Requirements

  • Strong technical foundations in machine learning, software engineering, statistics, behavioral science, HCI, or a related field.
  • Strong taste for model behavior: can look at user feedback and explain why one response feels thoughtful, natural, and useful while another does not.
  • Experience with LLMs, post-training, RL/RLHF, reward modeling, evals, synthetic data, pretraining data, or production ML systems.
  • Ability to work effectively with researchers, engineers, product teams, designers, domain experts, human-data teams and safety boundaries, and communicate clearly with each group.
  • Ability to translate subjective-seeming product questions into falsifiable hypotheses and rigorous evaluations without losing the nuance that made the question important.
  • Care about preserving individuality, adaptability, and behavioral diversity rather than optimizing every model toward one narrow style.
  • Excited by ambiguous capability problems where the signal is noisy, the failures are qualitative, and the solution may involve data, training, evals, product changes, or all of the above.
  • Like building load-bearing systems and processes when that is what the team needs, even if the work is not glamorous.

Nice-to-Haves

  • Think instinctively from the user’s perspective and care deeply about how models feel to work with, not only how they perform on benchmarks.
  • Want to shape how frontier agents communicate, collaborate, and build trust with millions of people.
  • Want to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users.

Skills

Machine LearningSoftware EngineeringStatisticsBehavioral ScienceHciLLMsPost-TrainingRlRLHFReward ModelingEvalsSynthetic DataPretraining DataProduction Ml Systems

Similar roles

ML Engineering jobs

Agent Post-Training, Artifacts Research

Train frontier models to generate polished artifacts (docs, spreadsheets, slides) by owning post-training improvements across RL, data, evals, and alignment. Requires strong ML fundamentals and hands-on LLM/RL experience.

295k – 445kSan Francisco, CAML EngineeringOn-site7+ YOELLMsRLHF

Agent Post-Training, Computer Use Research

Train frontier models to operate computers, browsers, and desktops. Design experiments, build evals, own post-training pipelines (RL, data, graders), and ship improvements into OpenAI agents.

295k – 445kSan Francisco, CAML EngineeringOn-site7+ YOERLHFLLMs

Agent Post-Training, Connectors Research

Train frontier agents to interface with professional software via code, APIs, and structured integrations. Design experiments, own post-training improvements (RL, evals, data), and ship capabilities into major model runs.

295k – 445kSan Francisco, CAML EngineeringOn-site7+ YOERLHFLLMs

Context Researcher

Context Researcher on the Agent Post-Training team scaling compute on context for frontier agent models. Designs experiments, owns post-training improvements, builds evals, and ships capabilities into Codex and ChatGPT.

295k – 445kSan Francisco, CAML EngineeringOn-site7+ YOELLMsRLHF

Research Engineer/Research Scientist

Research Engineer/Scientist improving model capabilities for personalized AI experiences. Focus on tool-use, instruction following, evaluations, and training improvements. Requires strong ML engineering and research experience.

295k – 555kSan Francisco, CAML EngineeringHybrid7+ YOEPythonResearch