# Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI
**Company:** [Scale AI](https://hotfix.jobs/companies/scale-ai)
**Location:** San Francisco, CA, New York, NY, Seattle, WA
**Salary:** $218K-$273K
**Experience:** 5+ years
**Skills:** Llm Training, RLHF, Rlvr, Ppo, Grpo, Multi-Agent Systems, Reinforcement Learning, PyTorch, Kubernetes, Machine Learning
**Posted:** 2026-02-12
> Develops next-gen Agent RL training platform for enterprise GenAI, integrating cutting-edge research to train state-of-the-art models for complex use cases. Requires 5+ years LLM production experience, RLHF expertise, recent top publications, and advanced CS degree.
## Job Description
## Responsibilities
- Train state-of-the-art models (internal and community-developed) for enterprise customers.
- Research and integrate cutting-edge algorithms into the training stack.
- Design solutions for complex multi-agent systems to learn from process and outcome-based rewards.

## Requirements
- 5+ years of LLM training in production environments.
- Experience with post-training methods (RLHF/RLVR) and algorithms (PPO/GRPO).
- Publications in top conferences (NeurIPS, ICLR, ICML) within last 2 years.
- PhD or Master's in Computer Science or related field.
**Apply:** https://hotfix.jobs/jobs/staff-machine-learning-research-engineer-agent-post-training-enterprise-genai-6bf3f15a-309e-458e-83ae-7496a79f17ad
**Canonical:** https://hotfix.jobs/jobs/staff-machine-learning-research-engineer-agent-post-training-enterprise-genai-6bf3f15a-309e-458e-83ae-7496a79f17ad