# Copy of Machine Learning Researcher, Audio
**Company:** [Bland AI](https://hotfix.jobs/companies/bland-ai)
**Location:** Remote
**Salary:** $140K-$250K
**Skills:** Text-To-Speech, Tts, Speech-To-Text, Stt, Neural Audio Codecs, Self-Supervised Learning, Generative Modeling, Asr, PyTorch, Distributed Training, Gpu Training, Inference Optimization
**Posted:** 2026-04-20
> Conducts foundational research and develops scalable ML models for speech-to-text, text-to-speech, and neural audio codecs in real-time voice AI agents. Requires deep expertise in voice modeling, self-supervised learning, and production deployment at enterprise scale.
## Job Description
## What You Will Do

### Build and Scale Next-Generation TTS Systems
- Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output.
- Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation.
- Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness.
- Optimize for real-time, low-latency inference in production.

### Advance Speech-to-Text Modeling
- Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.
- Leverage self-supervised pretraining and large-scale weak supervision.
- Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance.

### Pioneer Neural Audio Codecs
- Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.
- Explore discrete and continuous latent representations for scalable speech modeling.
- Design codec architectures that enable downstream generative modeling and controllable synthesis.

### Develop Scalable Training Pipelines
- Curate and process massive audio datasets across languages, speakers, and environments.
- Design staged training curricula and data filtering strategies.
- Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.

### Run Rigorous Experiments
- Design ablation studies that isolate the impact of architectural changes.
- Measure improvements using both objective metrics and perceptual evaluations.
- Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.

## What Makes You a Great Fit

### Deep Research Foundations
- Experience with self-supervised learning, multimodal modeling, or generative modeling.
- Ability to derive new formulations and implement them efficiently.

### Expertise in Voice Modeling
- Hands-on experience building or scaling TTS, STT, or neural audio codec systems.
- Familiarity with large scale speech datasets and real-world audio variability.
- Strong intuition for audio quality, prosody, and conversational dynamics.

### Systems and Hardware Awareness
- Experience training and serving large models on modern accelerators.
- Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.
- Understanding of real-time constraints in telephony or streaming environments.

### Experimental Rigor
- Track record of designing controlled experiments and meaningful ablations.
- Comfortable working with both offline benchmarks and live production metrics.
- Ability to move quickly from hypothesis to validation.

**Bonus Points**
- Experience with large scale distributed training.
- Research publications or open source contributions in speech or language AI.
- Background in real-time speech systems or telephony.
- PhD in ML, AI, or a related field, or equivalent research impact.

## Benefits and Compensation
- Healthcare, dental, vision.
- Meaningful equity in a fast-growing company.
- Every tool you need to succeed.
- Beautiful office in Jackson Square, SF with rooftop views.
- Competitive salary: $160,000 to $250,000.
**Apply:** https://hotfix.jobs/jobs/copy-of-machine-learning-researcher-audio-at-bland-ai-e4387bb2-3e29-4a80-a7f2-1e5bedf0c698
**Canonical:** https://hotfix.jobs/jobs/copy-of-machine-learning-researcher-audio-at-bland-ai-e4387bb2-3e29-4a80-a7f2-1e5bedf0c698