# Research Engineer, Reinforcement Learning
**Company:** [TensorStax](https://hotfix.jobs/companies/tensorstax)
**Location:** San Francisco, CA
**Skills:** Reinforcement Learning, Ppo, Dpo, Kto, RLHF, PyTorch, Gym, Grpo, Swe-Gym, Swe-Rl
**Posted:** 2025-03-17
> Develops RL environments and fine-tunes language models using PPO, DPO, and KTO to enhance agentic capabilities for data infrastructure tasks. Requires deep RL expertise, LLM fine-tuning knowledge, and strong problem-solving skills.
## Job Description
## Responsibilities
- Develop and refine reward functions to optimize agent behavior for complex data engineering tasks.
- Create RL gym environments for language model agents.
- Fine-tune language models using reinforcement learning techniques such as PPO, DPO, and KTO.
- Stay at the forefront of research on RL for language models, incorporating advancements like GRPO, SWE-Gym, and SWE-RL into practical applications.
- Curate and build high-quality datasets for supervised fine-tuning (SFT) and RLHF.
- Design experiments to evaluate and improve the agentic capabilities of language models in data environments.

## Requirements
- Deep understanding of reinforcement learning, reward shaping, and optimization strategies.
- Strong familiarity with LLM fine-tuning techniques (PPO, DPO, KTO) and their applications in reinforcement learning.
- Knowledge of recent advancements in RL for language models (GRPO, SWE-Gym, SWE-RL).
- Experience curating and constructing high-quality datasets for fine-tuning.
- Strong problem-solving skills and a history of working on complex ML projects.
- High agency—ability to work independently, experiment proactively, and drive research initiatives forward.

## Nice-to-Haves
- Experience with distributed training in PyTorch (DDP, FSDP).
- Hands-on experience designing RL environments for traditional RL problems.
- Contributions to open-source projects in RL, LLMs, or ML infrastructure.
- Familiarity with data lakes and warehouses (Snowflake, BigQuery, Redshift).

## Benefits
- 100% employer-covered health, dental, and vision insurance.
- 401(k) with company match.
- Access to Bay Club or Equinox in San Francisco.
**Apply:** https://hotfix.jobs/jobs/research-engineer-reinforcement-learning-at-tensorstax-6d885679-6d30-4738-82af-93a8feef5c4b
**Canonical:** https://hotfix.jobs/jobs/research-engineer-reinforcement-learning-at-tensorstax-6d885679-6d30-4738-82af-93a8feef5c4b