Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI

Develops and optimizes post-training algorithms for agent RL platforms, focusing on LLM training, inference frameworks, and multi-agent systems. Requires 1-3 years production LLM experience, expertise in PyTorch/CUDA, RLHF/PPO, and advanced degree.

218k – 273kSan Francisco, CANew York, NYSeattle, WAAI ResearchOnsite1+ YOE

Apply

About the role

Responsibilities

Build, profile and optimize our training and inference framework.
Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements.
Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.
Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts.

Requirements

At least 1-3 years of LLM training in a production environment
Passionate about system optimization
Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.
Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster
Experience with multi-node LLM training and inference
Strong software engineering skills, proficient in frameworks and tools such as CUDA, PyTorch, transformers, flash attention, etc.
Strong written and verbal communication skills to operate in a cross functional team environment.
PhD or Masters in Computer Science or a related field

Skills

PyTorchCUDATransformersFlash AttentionRLHFRlvrPpoGrpoGpu ClusterMulti-Node Training

Similar roles

AI Research jobs

Scale AI

Machine Learning Research Engineer, Agents - Enterprise GenAI

Develops and deploys state-of-the-art ML models and agents for enterprise GenAI using RL training and post-training algorithms. Requires 1-3 years LLM production experience, RLHF expertise, recent top publications, and advanced CS degree.

218k – 273kSan Francisco, CA +2AI ResearchOn-site1+ YOEPpoLLMs

Perplexity

AI Researcher

Advances AI products through post-training SOTA LLMs using supervised and reinforcement learning techniques on rich query datasets. Owns data pipelines, training frameworks, and model integration while collaborating across teams. Requires 2-6+ years in large-scale LLMs and Python/PyTorch expertise; PhD preferred.

220k – 485kSan Francisco, CA +1AI ResearchOn-site2+ YOESftDpo

Polymath

AI Research Resident

AI Research Resident collaborates on research projects developing benchmarks and environments for long-horizon AI agents, identifying model failure modes, and training autonomous agents. Requires current MS/PhD enrollment, RL experience, systems engineering, and strong publications.

200k – 200kSan Francisco, CAAI ResearchRemoteEntry levelBenchmarksFrontier Models

Nuro

ML Research Scientist, Prediction & Smart Agents

Build state-of-the-art ML models to predict traffic behavior for autonomous driving, using generative sequence modeling and controllable agents for planning and simulation. Requires PhD preferred, 2+ years deploying ML systems, and expertise in PyTorch and robotics ML.

194k – 291kMountain View, CAAI ResearchOn-site2+ YOEC++Python

Nuro

Machine Learning Research Scientist: Generative Modeling for Planning

Develops state-of-the-art generative models like diffusion and flow-matching for autonomous planning in self-driving tech. Requires PhD or MSc with 2-3 years experience in generative modeling for robotics, strong Python/C++ skills, and top research publications.

160k – 241kMountain View, CAAI ResearchOn-site2+ YOEC++LLMs