Member of Technical Staff — Audio and Voice AI

Design, build, and deploy production-grade voice and audio AI systems including real-time agents and speech-driven workflows for financial operations. Requires 5+ years engineering experience with focus on applied AI/ML or speech systems.

220k – 320kSan Francisco, CANew York, NYML EngineeringOnsite5+ YOE

Apply

About the role

What You’ll Do

Build & Deploy Voice AI Systems: design and ship production-ready audio and voice-based AI features, including real-time voice agents and speech-driven workflows.
Craft High-Quality Voice UX: use modern speech-to-text, text-to-speech, and conversational AI platforms to create natural, responsive, and emotionally aware voice experiences tailored to financial use cases.
Adapt & Fine-Tune Audio and Multimodal Models: fine-tune and optimize speech, audio, and LLM-based models for accuracy, latency, and reliability in real-world environments.
Engineer Real-Time, Scalable AI Pipelines: build end-to-end AI/ML pipelines spanning audio ingestion, streaming inference, orchestration, and monitoring with enterprise-grade availability and performance.
Establish Evaluation & Monitoring Frameworks (LLMOps): design rigorous evaluation systems to measure quality, latency, accuracy, drift, and business outcomes for voice and text-based AI systems.
Automate Financial Workflows via Voice: develop AI-powered voice automations that reduce manual effort in collections, reconciliation, and customer communication.
Collaborate Cross-Functionally: partner with Product, Engineering, Design, and customers to translate business needs into effective, user-centered voice AI solutions.
Measure & Communicate Impact: define success metrics and continuously improve AI systems based on real-world usage and customer feedback.

You Might Be a Fit If You…

Have 5+ years of software engineering experience, with 2+ years focused on applied AI/ML, speech, or audio systems in production.
Have built and shipped voice, audio, or conversational AI systems used by real customers.
Have experience with speech-to-text, text-to-speech, audio processing, or multimodal models.
Have integrated and fine-tuned LLMs for conversational or agent-based systems.
Understand LLMOps / MLOps best practices, including deployment pipelines, monitoring, evaluation, and A/B testing.
Are fluent in Python and experienced with PyTorch, TensorFlow, Transformers, or audio ML frameworks.
Have built real-time or low-latency systems and understand the tradeoffs involved.
Can translate business and UX requirements into robust, scalable AI solutions.
Have experience integrating AI systems into existing enterprise or SaaS platforms.
Enjoy working on ambiguous problems where product definition, UX, and engineering meet.

Compensation

Top-of-market salary and equity package

Benefits (for U.S.-based full-time employees)

Medical, dental & vision insurance coverage for you
401(k) & Match
Equity
Flexible PTO
Parental Leave

Skills

PythonPyTorchTensorFlowTransformersSpeech-To-TextText-To-SpeechAudio ProcessingMultimodal ModelsLlmopsMLOps

Similar roles

ML Engineering jobs

Perplexity

Member of Technical Staff

Build AI agents that navigate digital environments and perform user tasks. Requires strong AI/ML experience, Python proficiency, and product intuition.

220k – 405kSan Francisco, CAML EngineeringOn-site5+ YOEGoCdp

Perplexity

Member of Technical Staff

ML Engineer building and optimizing production recommendation, ranking, and personalization systems that integrate LLMs for Perplexity's AI product.

220k – 405kSan Francisco, CA +1ML EngineeringOn-site5+ YOELLMsFeature Stores

Perplexity

Member of Technical Staff

Build and own multimodal AI product and platform systems across the stack at Perplexity. Requires production systems experience, full-stack capability, and strong product judgment.

220k – 405kSan Francisco, CAML EngineeringOn-site5+ YOEC++Rust

Together AI

Staff Machine Learning Engineer, Voice AI

Staff ML Engineer to own the model serving stack for real-time voice inference (STT, TTS, speech-to-speech) on H100/H200 GPUs. Drive latency/throughput optimization using TRT-LLM and SGLang for models like Whisper and Parakeet.

220k – 280kSan Francisco, CAML EngineeringOn-site8+ YOEAsrTts

Typeface

Senior Staff ML Engineer

Leads architecture of scalable ML platforms for generative AI across text, image, audio, and video. Drives company-level strategy, mentors engineers, and builds large-scale systems requiring 12+ years experience in ML infrastructure.

220k – 247kPalo Alto, CAML EngineeringHybrid12+ YOELLMsC/C++