# Research Engineer, Core ML
**Company:** [Together AI](https://hotfix.jobs/companies/together-ai)
**Location:** San Francisco, CA
**Salary:** $200K-$280K
**Experience:** 3+ years
**Skills:** Python, Sglang, vLLM, RLHF, Grpo, Dpo, Speculative Decoding, Atlas, Gpu Optimization, Distributed Systems
**Posted:** 2026-02-18
> Research Engineer building production ML systems at the intersection of efficient inference, RL/post-training, and serving engines. Translates algorithms into scalable infrastructure improving latency, throughput, and model quality. Requires 3+ years ML systems experience and advanced degree.
## Job Description
## Responsibilities

- **Advance inference efficiency end-to-end**
  - Design and prototype algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference.
  - Implement and maintain changes in high-performance inference engines (e.g., SGLang or vLLM-style systems, speculative decoding like ATLAS, quantization).
  - Profile and optimize performance across GPU, networking, and memory layers.

- **Unify inference with RL / post-training**
  - Design and operate RL and post-training pipelines (e.g., RLHF, RLAIF, GRPO, DPO-style methods, reward modeling).
  - Optimize RL workloads with inference-aware techniques like async rollouts and speculative decoding.
  - Train, evaluate, and iterate on frontier models.
  - Co-design algorithms and infrastructure to identify bottlenecks.
  - Run ablations and scale-up experiments.

- **Own critical systems at production scale**
  - Profile, debug, and optimize under real workloads.
  - Drive roadmap items requiring engine modifications.
  - Establish metrics, benchmarks, and experimentation frameworks.

- **Provide technical leadership (Staff level)**
  - Set technical direction for cross-team efforts.
  - Mentor engineers and researchers.

## Requirements

Deep expertise in one or more areas with breadth to work across the stack:

- Bias toward implementation and shipping.
- Expertise in: large-scale inference systems (SGLang, vLLM), RL/post-training for LLMs (GRPO, RLHF), model architecture, distributed systems/HPC for ML.
- Strong Python coding, performance profiling/optimization.
- Research foundation with track record (papers, open-source, production).

**Minimum qualifications**
- 3+ years experience in ML systems, model training/inference, or equivalent.
- Advanced degree in Computer Science, EE, or related field, or equivalent.
- Experience owning complex technical projects end-to-end.

## Compensation

US base salary range: $200,000 - $280,000 + equity + benefits.
**Apply:** https://hotfix.jobs/jobs/research-engineer-core-ml-at-together-ai-c954a76d-dfac-4802-bfe5-753fc65176cd
**Canonical:** https://hotfix.jobs/jobs/research-engineer-core-ml-at-together-ai-c954a76d-dfac-4802-bfe5-753fc65176cd