Responsibilities
- Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows.
- Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques.
- Build and iterate a comprehensive evaluation framework covering objective metrics, human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure.
- Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment.
Basic Qualifications
- Python expert with deep proficiency in writing clean, efficient code for AI/ML systems.
- Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.
- Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.
- Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing.
- Experience building or working with large-scale distributed training and inference systems on Kubernetes.
- Proactive, self-driven attitude — ready to grind in a fast-paced, high-caliber team.
Compensation and Benefits
$150,000 - $450,000 USD base salary, plus equity, comprehensive medical, vision, dental coverage, 401(k), short & long-term disability insurance, life insurance, and various perks.