Researcher, Artifacts - Agent Post-Training

250k – 380kSan Francisco, CAHybridMay 22

Summary

Train frontier models at OpenAI to create polished, useful artifacts like documents, spreadsheets, and dashboards. Own post-training improvements across RL, data pipelines, evals, and graders to ship production agent capabilities.

About the role

Responsibilities

Design and run experiments that improve agentic model behavior for complex software and plugins
Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis
Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions
Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior
Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs
Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness
Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments
Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes

Requirements

Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field, with ability to learn quickly across new areas
Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems
Excitement for open-ended problems where the path is unclear, the signal is noisy, and the right answer requires both research taste and engineering execution
Focus on product impact and model behavior, with opinions about what makes an agent useful, reliable, honest, tasteful, and easy to work with
Ability to move from a vague behavioral problem to a concrete experiment: define the hypothesis, build the pipeline, run the model, analyze the result, and decide what to do next
Comfort working across research, product, infrastructure, data, evals, and safety boundaries, with clear communication skills
Willingness to build load-bearing systems and processes when needed, even if the work is not glamorous
Drive to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users

Nice-to-Haves

Prior background in consulting, finance, marketing, operations, or data science

Skills

Machine LearningReinforcement LearningRLHFRLAIFLLMsEvalsSynthetic DataModel TrainingPost-trainingGrader SystemsPythonStatistics

Similar roles at this salary range

All ML Engineering jobs →

Airbnb

Jun 8

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems

Traba

Jun 8

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka

Traba

Jun 8

Senior Software Engineer

Founding Senior Applied Agent Engineer building production LLM agent systems that automate supply chain workflows. Requires 5+ years engineering experience with 1+ year shipping LLM/agent features, strong Python/TypeScript skills, and hands-on agent stack experience.

200k – 240kNew York, NY +1ML EngineeringOn-sitePythonNode.js

Cribl

Jun 7

Staff Software Engineer, Cribl AI

Staff-level AI/ML engineer building and productionizing generative AI features across backend and frontend for Cribl's observability platform. Requires 6+ years experience, AI/ML and MLOps background, and TypeScript/JavaScript proficiency.

225k – 265kUnited StatesML EngineeringRemoteLLMsReact

Perplexity

Jun 6

Member of Technical Staff

ML Engineer building and optimizing production recommendation, ranking, and personalization systems that integrate LLMs for Perplexity's AI product.

220k – 405kSan Francisco, CA +1ML EngineeringOn-siteLLMsFeature Stores

Apply