Skip to content

Researcher, Artifacts - Agent Post-Training

250k – 380kSan Francisco, CAHybrid
Summary

Train frontier models at OpenAI to create polished, useful artifacts like documents, spreadsheets, and dashboards. Own post-training improvements across RL, data pipelines, evals, and graders to ship production agent capabilities.

About the role

Responsibilities

  • Design and run experiments that improve agentic model behavior for complex software and plugins
  • Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis
  • Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions
  • Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements
  • Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior
  • Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs
  • Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness
  • Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments
  • Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes

Requirements

  • Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field, with ability to learn quickly across new areas
  • Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems
  • Excitement for open-ended problems where the path is unclear, the signal is noisy, and the right answer requires both research taste and engineering execution
  • Focus on product impact and model behavior, with opinions about what makes an agent useful, reliable, honest, tasteful, and easy to work with
  • Ability to move from a vague behavioral problem to a concrete experiment: define the hypothesis, build the pipeline, run the model, analyze the result, and decide what to do next
  • Comfort working across research, product, infrastructure, data, evals, and safety boundaries, with clear communication skills
  • Willingness to build load-bearing systems and processes when needed, even if the work is not glamorous
  • Drive to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users

Nice-to-Haves

  • Prior background in consulting, finance, marketing, operations, or data science
Skills
Machine LearningReinforcement LearningRLHFRLAIFLLMsEvalsSynthetic DataModel TrainingPost-trainingGrader SystemsPythonStatistics
Similar roles at this salary range
All ML Engineering jobs →
Airbnb

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems
Traba

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka
Traba

Senior Software Engineer

Founding Senior Applied Agent Engineer building production LLM agent systems that automate supply chain workflows. Requires 5+ years engineering experience with 1+ year shipping LLM/agent features, strong Python/TypeScript skills, and hands-on agent stack experience.

200k – 240kNew York, NY +1ML EngineeringOn-sitePythonNode.js
Cribl

Staff Software Engineer, Cribl AI

Staff-level AI/ML engineer building and productionizing generative AI features across backend and frontend for Cribl's observability platform. Requires 6+ years experience, AI/ML and MLOps background, and TypeScript/JavaScript proficiency.

225k – 265kUnited StatesML EngineeringRemoteLLMsReact
Perplexity

Member of Technical Staff

ML Engineer building and optimizing production recommendation, ranking, and personalization systems that integrate LLMs for Perplexity's AI product.

220k – 405kSan Francisco, CA +1ML EngineeringOn-siteLLMsFeature Stores