Researcher, Computer Use - Agent Post-Training

250k – 380kSan Francisco, CAOnsiteMay 22

Summary

Train frontier models to operate computers, navigate browsers/desktops, and complete complex workflows. Own post-training experiments, evals, RL pipelines, and ship improvements into OpenAI's agent products.

About the role

Responsibilities

Design and run experiments that improve agentic model behavior for complex computer use, including desktop and browser.
Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis.
Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions.
Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements.
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.
Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs.
Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.
Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments.
Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes.

Requirements

Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field, and can learn quickly across the parts you have not worked in before.
Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems.
Excited by open-ended problems where the path is unclear, the signal is noisy, and the right answer requires both research taste and engineering execution.
Care about product impact and model behavior, not just benchmark movement. Have opinions about what makes an agent useful, reliable, honest, tasteful, and easy to work with.
Can move from a vague behavioral problem to a concrete experiment: define the hypothesis, build the pipeline, run the model, analyze the result, and decide what to do next.
Comfortable working across research, product, infrastructure, data, evals, and safety boundaries, and can communicate clearly with each group.
Like building load-bearing systems and processes when that is what the team needs, even if the work is not glamorous.
Want to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users.

Skills

Machine LearningReinforcement LearningRLHFLLMsPost-trainingEvaluationsSynthetic DataModel TrainingSoftware EngineeringStatistics

Similar roles at this salary range

All ML Engineering jobs →

Airbnb

Jun 8

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems

Traba

Jun 8

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka

Traba

Jun 8

Senior Software Engineer

Founding Senior Applied Agent Engineer building production LLM agent systems that automate supply chain workflows. Requires 5+ years engineering experience with 1+ year shipping LLM/agent features, strong Python/TypeScript skills, and hands-on agent stack experience.

200k – 240kNew York, NY +1ML EngineeringOn-sitePythonNode.js

Cribl

Jun 7

Staff Software Engineer, Cribl AI

Staff-level AI/ML engineer building and productionizing generative AI features across backend and frontend for Cribl's observability platform. Requires 6+ years experience, AI/ML and MLOps background, and TypeScript/JavaScript proficiency.

225k – 265kUnited StatesML EngineeringRemoteLLMsReact

Perplexity

Jun 6

Member of Technical Staff

ML Engineer building and optimizing production recommendation, ranking, and personalization systems that integrate LLMs for Perplexity's AI product.

220k – 405kSan Francisco, CA +1ML EngineeringOn-siteLLMsFeature Stores

Apply