Skip to content

Research Scientist - LLM

Conducts ML research to advance LLMs and audio models for real-time voice AI agents, focusing on reasoning, latency, and conversational quality. Prototypes models, designs evaluations, and bridges research to production systems requiring strong PyTorch expertise and experimental mindset.

225k – 400kRedwood City, CAAI ResearchOnsite

About the role

Key Responsibilities

  • Research & Experimentation – Explore and develop new techniques across LLMs and audio models to improve reasoning, latency, and conversational quality in real-time systems.
  • Model Prototyping – Rapidly build and iterate on experimental models and pipelines, turning research ideas into working prototypes.
  • Evaluation & Benchmarking – Design novel evaluation frameworks, datasets, and metrics to measure performance on complex, real-world voice tasks.
  • Bridge Research to Production – Collaborate closely with engineering to translate research insights into deployable systems.
  • Human Feedback Loops – Develop methods to incorporate human evaluation into model improvement, especially for subjective conversational quality.
  • Advance the Frontier – Stay at the cutting edge of ML research and bring new ideas into Retell’s product and infrastructure.

You Might Thrive If You

  • Strong ML Research Background – Worked on advanced ML problems (e.g., LLM pre-training and post-training, transcription model training, text-to-speech model training, or multimodal systems).
  • Deep Technical Foundation – Comfortable with PyTorch, model architectures, and the math behind modern machine learning.
  • Experimental Mindset – Enjoy exploring open-ended problems and iterating quickly on ideas.
  • Bridging Theory & Practice – Translate research into systems that work in real-world environments.
  • Startup-Ready – Thrive in fast-paced environments with high ownership and ambiguity.
  • Collaborative & Clear Communicator – Explain complex ideas and work cross-functionally to drive impact.

Compensation & Benefits

  • Cash: $225,000 - $400,000 base salary
  • Equity: Offers Equity
  • Location: Redwood City, CA, US (100% Relocation Provided)
  • 100% coverage for medical, dental, and vision insurance
  • $70/day DoorDash credit for unlimited meals and snacks
  • $200/month wellness reimbursement
  • $300/month commuter reimbursement
  • $75/month phone bill reimbursement
  • $50/month internet reimbursement

Skills

LLMsPyTorchAudio ModelsMachine LearningModel ArchitecturesLlm Pre-TrainingLlm Post-TrainingTranscription ModelsText-To-SpeechMultimodal Systems

Similar roles

AI Research jobs

Research Engineer, AI Safety & Alignment

Develops evaluation methods, alignment techniques, and adversarial testing for large language models to ensure safety and alignment with human values. Requires PhD in ML/CS, production code skills, GPU experience, and transformers/RL expertise.

225k – 400kRedwood City, CAAI ResearchOn-siteRLHFGpus

Applied AI Scientist, Small Language Model and AI Training

Leads R&D on small language models and AI training, developing efficient architectures, optimizing performance, and ensuring safety. Collaborates with research, engineering, and product teams using Python, PyTorch, TensorFlow, or JAX.

219k – 276kSan Francisco, CAAI ResearchHybridJAXPython

Research Scientist, Safety Post Training

Develop and apply post-training methods and interpretability techniques to improve safety and understanding of frontier AI systems. Requires 3+ years of ML experience, expertise in RL techniques like RLHF and DPO, and published research in generative AI.

216k – 270kSan Francisco, CA +1AI ResearchHybrid3+ YOEDpoRLHF

Post-Training Research Scientist

Conducts research on post-training methodologies and performant inference for AI models, balancing pure research with applied work for production systems. Requires PhD in ML with top publications and ability to design rigorous experiments at scale.

210k – 285kSan Francisco, CAAI ResearchHybridJAXLLMs

Machine Learning Scientist (All Levels)

Conducts machine learning research in medical NLP for conversation summarization, evidence extraction, and outcome prediction. Publishes at top AI conferences, deploys models to production, and requires MS/PhD plus strong PyTorch/TensorFlow experience.

205k – 300kSan Francisco, CA +2AI ResearchHybridJAXPyTorch