Principal Research Engineer, Post-Training

275k – 400kRedwood City, CAML EngineeringOnsite7+ YOEJun 22

Summary

Lead technical vision and execution for post-training systems that transform foundation models into intelligent, engaging products. Drive alignment algorithms, RL, and infrastructure for large-scale LLM training and serving.

About the role

What You'll Do

Technical Leadership & Mentorship

Define and drive the technical roadmap for mid- and post-training systems, balancing research innovation with production reliability and scalability
Mentor and grow a team of researchers and engineers through technical guidance, design reviews, and career development
Establish best practices for experimentation, model development, and deployment

Research & Model Development

Lead the development of alignment algorithms, optimization techniques, and training objectives to improve model capabilities and data efficiency
Drive advances in mid- and post-training methodologies including reinforcement learning, preference optimization, supervised fine-tuning, and emerging alignment approaches
Identify and execute high-impact research opportunities that improve model behavior, safety, and user engagement
Develop robust evaluation frameworks and quality signals to measure real-world model performance

Systems & Infrastructure

Lead the design of efficient training and inference systems for large-scale generative models
Architect scalable data pipelines that transform diverse data sources into high-quality training datasets
Partner with infrastructure teams to optimize distributed training, GPU utilization, and serving efficiency
Drive improvements in experimentation platforms, data quality systems, and model observability

Required Qualifications

PhD in Computer Science, Machine Learning, AI, or a related field, or equivalent industry experience
Significant experience leading technical projects or teams in machine learning, AI research, or large-scale distributed systems
Experience scaling and mentoring high-performing research and engineering teams
Deep understanding of modern machine learning techniques, including transformers, reinforcement learning, alignment methods, and large language models
Strong track record of delivering impactful research or applied ML systems in production environments
Expertise in designing, building, and maintaining production-quality ML systems and infrastructure
Experience training, serving, debugging, and optimizing large-scale models on GPU-based systems
Experience leading teams working on large language model training, mid-training, or post-training
Experience with product experimentation, online evaluation, and A/B testing frameworks
Strong software engineering skills with the ability to write clean, maintainable, and scalable code
Excellent communication skills and the ability to influence technical direction across teams
Lead complex, cross-functional initiatives across data, training infrastructure, evaluation, and model serving

Nice to Have

Hands-on experience working directly with open-source models like Mistral and Qwen, particularly adapting them via mid- and post-training for specific personas, creative writing, or role-playing applications
Familiarity with cloud-native ML infrastructure, including Kubernetes, Docker, and modern orchestration platforms
Publications in leading machine learning conferences or demonstrated contributions to the broader AI community

Skills

TransformersReinforcement LearningLarge Language ModelsAlignment MethodsGPU-based SystemsDistributed TrainingKubernetesDockerPythonA/B Testing

Similar roles at this salary range

All ML Engineering jobs →

OpenAI

Jun 25

Research Engineer/Research Scientist

Research Engineer/Scientist improving model capabilities for personalized AI experiences. Focus on tool-use, instruction following, evaluations, and training improvements. Requires strong ML engineering and research experience.

295k – 555kSan Francisco, CAML EngineeringHybrid7+ YOEPythonResearch

xAI

Jun 24

Member of Technical Staff

Hands-on technical contributor focused on stabilizing and advancing large language model training, fine-tuning, and research in AI/deep learning. Requires a bachelor's degree and 2+ years of experience with distributed systems, ML infrastructure, and programming in Rust/C++/Python.

324k – 396kPalo Alto, CAML EngineeringOn-site2+ YOEC++GPU

xAI

Jun 24

Member of Technical Staff

Hands-on technical leader building and scaling large language models and AI systems. Requires 3-5+ years of AI/ML experience with strong Python and deep learning frameworks.

324k – 396kPalo Alto, CAML EngineeringOn-site5+ YOEC++JAX

Mixpanel

Jun 23

Senior Software Engineer, AI Platform

Senior Software Engineer building scalable AI infrastructure, agent orchestration frameworks, evaluation systems, and high-performance LLM serving at Mixpanel. Requires 5+ years experience and hands-on LLM/agent work.

226k – 306kSan Francisco, CAML EngineeringHybrid5+ YOELLMsMLOps

Twilio

Jun 23

Tech Lead, Applied Research

Tech Lead driving AI R&D and end-to-end delivery of production-ready prototypes using full-stack development, LLMs, and emerging technologies. Requires 10+ years experience and strong autonomy.

228k – 335kUnited StatesML EngineeringRemote10+ YOEGoSQL

Apply