# Full-Stack Software Engineer, Reinforcement Learning
**Company:** [Anthropic](https://hotfix.jobs/companies/anthropic)
**Location:** San Francisco, CA, New York, NY
**Salary:** $300K-$405K
**Skills:** Python, React, TypeScript, GCP, AWS, Docker, Asyncio, Trio, APIs, CI/CD
**Posted:** 2026-04-14
> Build full-stack platforms, tools, and UIs for RL environment creation, data collection at scale, and training observability to improve AI models like Claude. Requires strong Python, modern web stack proficiency, high agency, and ability to ship reliable systems quickly in a fast-paced environment.
## Job Description
## What You'll Do

- Build and extend web platforms for RL environment creation, management, and quality review — including environment configuration, versioning, and validation workflows
- Develop vendor-facing interfaces and tooling that let external partners create, submit, and iterate on training environments with minimal friction
- Design and implement platforms for human data collection at scale, including labeling workflows, quality assurance systems, and feedback mechanisms that surface reward signal integrity issues early
- Build evaluation dashboards and observability UIs that give researchers real-time insight into environment quality, training run health, and reward hacking
- Create backend services and APIs that connect environment authoring tools, data collection systems, and RL training infrastructure
- Build and expand scalable code data generation pipelines, producing diverse programming tasks with robust reward signals across languages and difficulty levels
- Develop onboarding automation and documentation tooling so new vendors and internal users ramp up in hours, not weeks
- Partner closely with RL researchers, data operations, and vendor management to translate ambiguous requirements into well-scoped, well-designed products

## You May Be a Good Fit If You

- Have strong software engineering fundamentals and real full-stack range — you're comfortable owning a surface from database schema to frontend
- Are proficient in **Python** and a modern web stack (**React**, **TypeScript**, or similar)
- Have a track record of shipping systems that solved a hard problem, not just shipped on time — e.g. you built the thing that made your team 10x faster, or the internal tool nobody thought was possible
- Operate with high agency: you identify what needs to be done and drive it forward without waiting for a ticket
- Have found yourself wondering \"why isn't this moving faster?\" in previous roles — and then have done something about it
- Care about UX and can build interfaces that are intuitive for both technical researchers and non-technical labelers
- Communicate clearly with researchers, operations teams, and engineers, and can turn vague asks into well-scoped work
- Thrive in a fast-moving environment where priorities shift, Claude is your pair programmer, and the next problem is often one nobody has solved before

## Strong Candidates May Also Have

- Built data collection, labeling, or annotation platforms — ideally ones that had to scale across many vendors or many task types
- Background building multi-tenant platforms with role-based access, audit trails, and vendor management workflows
- Experience with cloud infrastructure (**GCP** or **AWS**), **Docker**, and CI/CD pipelines
- Familiarity with LLM training, fine-tuning, or evaluation workflows
- Experience with async Python (**Trio**, **asyncio**) or high-throughput API design
- Background in dashboards, monitoring, or observability tooling
- Experience working directly with external vendors or partners on technical integrations
**Apply:** https://hotfix.jobs/jobs/full-stack-software-engineer-reinforcement-learning-at-anthropic-e83fcb7c-8acc-457c-ae8f-042e87a2b786
**Canonical:** https://hotfix.jobs/jobs/full-stack-software-engineer-reinforcement-learning-at-anthropic-e83fcb7c-8acc-457c-ae8f-042e87a2b786