Skip to content

Copy of Machine Learning Researcher, Audio

Conducts foundational research and develops scalable ML models for speech-to-text, text-to-speech, and neural audio codecs in real-time voice AI agents. Requires deep expertise in voice modeling, self-supervised learning, and production deployment at enterprise scale.

140k – 250kSan Francisco, CAAI ResearchRemote

About the role

What You Will Do

Build and Scale Next-Generation TTS Systems

  • Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output.
  • Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation.
  • Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness.
  • Optimize for real-time, low-latency inference in production.

Advance Speech-to-Text Modeling

  • Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.
  • Leverage self-supervised pretraining and large-scale weak supervision.
  • Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance.

Pioneer Neural Audio Codecs

  • Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.
  • Explore discrete and continuous latent representations for scalable speech modeling.
  • Design codec architectures that enable downstream generative modeling and controllable synthesis.

Develop Scalable Training Pipelines

  • Curate and process massive audio datasets across languages, speakers, and environments.
  • Design staged training curricula and data filtering strategies.
  • Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.

Run Rigorous Experiments

  • Design ablation studies that isolate the impact of architectural changes.
  • Measure improvements using both objective metrics and perceptual evaluations.
  • Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.

What Makes You a Great Fit

Deep Research Foundations

  • Experience with self-supervised learning, multimodal modeling, or generative modeling.
  • Ability to derive new formulations and implement them efficiently.

Expertise in Voice Modeling

  • Hands-on experience building or scaling TTS, STT, or neural audio codec systems.
  • Familiarity with large scale speech datasets and real-world audio variability.
  • Strong intuition for audio quality, prosody, and conversational dynamics.

Systems and Hardware Awareness

  • Experience training and serving large models on modern accelerators.
  • Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.
  • Understanding of real-time constraints in telephony or streaming environments.

Experimental Rigor

  • Track record of designing controlled experiments and meaningful ablations.
  • Comfortable working with both offline benchmarks and live production metrics.
  • Ability to move quickly from hypothesis to validation.

Bonus Points

  • Experience with large scale distributed training.
  • Research publications or open source contributions in speech or language AI.
  • Background in real-time speech systems or telephony.
  • PhD in ML, AI, or a related field, or equivalent research impact.

Benefits and Compensation

  • Healthcare, dental, vision.
  • Meaningful equity in a fast-growing company.
  • Every tool you need to succeed.
  • Beautiful office in Jackson Square, SF with rooftop views.
  • Competitive salary: $160,000 to $250,000.

Skills

Text-To-SpeechTtsSpeech-To-TextSttNeural Audio CodecsSelf-Supervised LearningGenerative ModelingAsrPyTorchDistributed TrainingGpu TrainingInference Optimization

Similar roles

AI Research jobs

Machine Learning Researcher, Multimodal LLMs

Develops next-generation multimodal LLMs integrating speech, text, tools, and real-time reasoning for conversational AI agents. Requires strong background in LLMs, multimodal models, fast experimentation, and production deployment experience.

140k – 250kSan Francisco, CAAI ResearchRemoteLLMsPrompting

Forward Deployed Research Scientist

Forward Deployed Research Scientist collaborates with frontier AI labs on data strategies, fine-tunes open-weight LLMs, runs ablation studies, and validates data impact for client projects. Requires MS/PhD in ML/NLP/CS, hands-on LLM fine-tuning, and fast-paced experimental rigor.

140k – 200kSan Francisco, CAAI ResearchHybridDpoLLMs

Research Scientist - Simplex

Develops theories of intelligence grounded in neural network internal structures, focusing on belief geometries in LLMs and biological brains. Conducts experiments bridging mathematics, ML interpretability, and safety research; requires PhD-level quantitative depth and hands-on coding.

140k – 200kEmeryville, CAAI ResearchOn-siteLLMsPyTorch

AI Research Engineer – Datadog AI Research (DAIR)

Builds ML infrastructure and tooling to productionize AI research in observability models, SRE agents, and code repair. Requires strong Python/ML systems expertise, distributed computing experience, and proficiency in PyTorch/JAX.

140k – 400kNew York, NYAI ResearchOn-siteGoJAX

AI Research Scientist – Datadog AI Research (DAIR)

Conducts cutting-edge research in Generative AI, building foundation models and autonomous agents for cloud observability, SRE, and code repair. Requires PhD in ML or related field, publications at top conferences, and expertise in PyTorch/TensorFlow distributed training.

140k – 400kNew York, NYAI ResearchOn-siteCUDAPyTorch