# Member of Technical Staff - Post-Training and RL
**Company:** [xAI](https://hotfix.jobs/companies/xai)
**Location:** Palo Alto, CA
**Salary:** $180K-$600K
**Skills:** Reinforcement Learning, RLHF, Dpo, Reward Modeling, Ai Alignment, Post-Training, PyTorch, JAX, Transformers, Machine Learning
**Posted:** 2026-04-29
> Develops advanced post-training and reinforcement learning techniques like RLHF/DPO and reward modeling to enhance AI model reasoning, truthfulness, and real-world capabilities at xAI. Seeks passionate AI enthusiasts obsessed with truth-seeking models; prior experience preferred but not required.
## Job Description
## Responsibilities
- Work on critical post-training and reinforcement learning challenges, including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities.

## Basic Qualifications
- Believe truth-seeking AI is the most important and challenging problem.
- Obsessed about building incredibly useful models through post-training and RL techniques.
- Power user of AI models and eager to push boundaries with reinforcement learning and alignment methods.
- Previous work on post-training, RLHF, or models used by millions is a big plus (relevant experience not required).
- Take pride in work and thrive in meritocratic environments.

## Compensation and Benefits
- $180,000 - $600,000 USD
- Equity, comprehensive medical, vision, and dental coverage
- Access to 401(k) retirement plan
- Short & long-term disability insurance
- Life insurance
- Various other discounts and perks
**Apply:** https://hotfix.jobs/jobs/member-of-technical-staff-post-training-and-rl-at-xai-ed7e3533-4b73-4672-b792-579d9520ae65
**Canonical:** https://hotfix.jobs/jobs/member-of-technical-staff-post-training-and-rl-at-xai-ed7e3533-4b73-4672-b792-579d9520ae65