Researcher, Connectors - Agent Post-Training

250k – 380kSan Francisco, CAHybridMay 22

Summary

Train frontier agents to use code, APIs, and enterprise tools (Slack, GitHub, Salesforce, etc.) by designing RL experiments, building evals, and owning the post-training stack that ships into Codex and ChatGPT.

About the role

Responsibilities

Design and run experiments that improve agentic model behavior for complex software and plugins
Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis
Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions
Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior
Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs
Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness
Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments
Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes

Requirements

Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field
Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems
Ability to move from a vague behavioral problem to a concrete experiment: define the hypothesis, build the pipeline, run the model, analyze the result, and decide what to do next
Comfortable working across research, product, infrastructure, data, evals, and safety boundaries

Nice-to-Haves

Excited by open-ended problems where the path is unclear and the signal is noisy
Care about product impact and model behavior, not just benchmark movement
Can communicate clearly across research, product, infrastructure, data, evals, and safety groups
Like building load-bearing systems and processes when that is what the team needs

Skills

Machine LearningReinforcement LearningLLMsRLHFRLAIFPost-trainingEvalsSynthetic DataModel TrainingProduction ML Systems

Similar roles at this salary range

All ML Engineering jobs →

Airbnb

Jun 8

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems

Traba

Jun 8

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka

Traba

Jun 8

Senior Software Engineer

Founding Senior Applied Agent Engineer building production LLM agent systems that automate supply chain workflows. Requires 5+ years engineering experience with 1+ year shipping LLM/agent features, strong Python/TypeScript skills, and hands-on agent stack experience.

200k – 240kNew York, NY +1ML EngineeringOn-sitePythonNode.js

Cribl

Jun 7

Staff Software Engineer, Cribl AI

Staff-level AI/ML engineer building and productionizing generative AI features across backend and frontend for Cribl's observability platform. Requires 6+ years experience, AI/ML and MLOps background, and TypeScript/JavaScript proficiency.

225k – 265kUnited StatesML EngineeringRemoteLLMsReact

Perplexity

Jun 6

Member of Technical Staff

ML Engineer building and optimizing production recommendation, ranking, and personalization systems that integrate LLMs for Perplexity's AI product.

220k – 405kSan Francisco, CA +1ML EngineeringOn-siteLLMsFeature Stores

Apply