Research Engineer, Core ML

Research Engineer building production ML systems at the intersection of efficient inference, RL/post-training, and serving engines. Translates algorithms into scalable infrastructure improving latency, throughput, and model quality. Requires 3+ years ML systems experience and advanced degree.

200k – 280kSan Francisco, CAML EngineeringOnsite3+ YOE

Apply

About the role

Responsibilities

Advance inference efficiency end-to-end
- Design and prototype algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference.
- Implement and maintain changes in high-performance inference engines (e.g., SGLang or vLLM-style systems, speculative decoding like ATLAS, quantization).
- Profile and optimize performance across GPU, networking, and memory layers.
Unify inference with RL / post-training
- Design and operate RL and post-training pipelines (e.g., RLHF, RLAIF, GRPO, DPO-style methods, reward modeling).
- Optimize RL workloads with inference-aware techniques like async rollouts and speculative decoding.
- Train, evaluate, and iterate on frontier models.
- Co-design algorithms and infrastructure to identify bottlenecks.
- Run ablations and scale-up experiments.
Own critical systems at production scale
- Profile, debug, and optimize under real workloads.
- Drive roadmap items requiring engine modifications.
- Establish metrics, benchmarks, and experimentation frameworks.
Provide technical leadership (Staff level)
- Set technical direction for cross-team efforts.
- Mentor engineers and researchers.

Requirements

Deep expertise in one or more areas with breadth to work across the stack:

Bias toward implementation and shipping.
Expertise in: large-scale inference systems (SGLang, vLLM), RL/post-training for LLMs (GRPO, RLHF), model architecture, distributed systems/HPC for ML.
Strong Python coding, performance profiling/optimization.
Research foundation with track record (papers, open-source, production).

Minimum qualifications

3+ years experience in ML systems, model training/inference, or equivalent.
Advanced degree in Computer Science, EE, or related field, or equivalent.
Experience owning complex technical projects end-to-end.

Compensation

US base salary range: $200,000 - $280,000 + equity + benefits.

Skills

PythonSglangvLLMRLHFGrpoDpoSpeculative DecodingAtlasGpu OptimizationDistributed Systems

Similar roles

ML Engineering jobs

Snowflake

AI System Research and Development Engineer - Optimization

Develop and optimize GPU kernels and deep learning systems for LLM training and inference at Snowflake AI Research. Requires 5+ years in GPU/HPC optimization and strong proficiency in PyTorch, TensorFlow, JAX, and CUDA.

200k – 265kBellevue, WAML EngineeringOn-site5+ YOEJAXCUDA

Baseten

Post-Training Research Engineer

Build in-house tooling for post-training custom ML models using advanced techniques like RL and finetuning. Requires deep expertise in transformer training, PyTorch distributed systems, parallelism strategies, GPU performance optimization, and HPC platforms.

200k – 275kSan Francisco, CAML EngineeringHybridJAXRay

Glean

Machine Learning Engineer, Enterprise Brain

Develop ML systems for the Enterprise Brain, focusing on proactive AI for task prediction, automation, and agentic workflows using LLMs and advanced techniques. Requires 3+ years ML experience, Python proficiency, and expertise in evaluation and production systems.

200k – 300kPalo Alto, CA +1ML EngineeringHybrid3+ YOELLMsPython

Cantina

Machine Learning Engineer, Images

Designs, fine-tunes, and deploys image generation models for photorealistic AI bots, optimizing for consistency, latency, and quality. Requires 5+ years software engineering, 2+ years production ML, and expertise in diffusion models like Stable Diffusion and PyTorch.

200k – 265kSan Francisco, CAML EngineeringRemote5+ YOEGCPAWS

Cinder

AI Engineer

Builds and deploys production-scale AI/ML systems using LLMs, from fine-tuning and evaluation to low-latency infrastructure. Requires 5+ years experience with PyTorch/TensorFlow, MLOps, AWS, and taking models to production at high-growth startups.

200k – 250kNew York, NYML EngineeringHybrid5+ YOERAGAWS