Skip to content

Member of Technical Staff — Audio and Voice AI

Design, build, and deploy production-grade voice and audio AI systems including real-time agents and speech-driven workflows for financial operations. Requires 5+ years engineering experience with focus on applied AI/ML or speech systems.

220k – 320kSan Francisco, CANew York, NYML EngineeringOnsite5+ YOE

About the role

What You’ll Do

  • Build & Deploy Voice AI Systems: design and ship production-ready audio and voice-based AI features, including real-time voice agents and speech-driven workflows.
  • Craft High-Quality Voice UX: use modern speech-to-text, text-to-speech, and conversational AI platforms to create natural, responsive, and emotionally aware voice experiences tailored to financial use cases.
  • Adapt & Fine-Tune Audio and Multimodal Models: fine-tune and optimize speech, audio, and LLM-based models for accuracy, latency, and reliability in real-world environments.
  • Engineer Real-Time, Scalable AI Pipelines: build end-to-end AI/ML pipelines spanning audio ingestion, streaming inference, orchestration, and monitoring with enterprise-grade availability and performance.
  • Establish Evaluation & Monitoring Frameworks (LLMOps): design rigorous evaluation systems to measure quality, latency, accuracy, drift, and business outcomes for voice and text-based AI systems.
  • Automate Financial Workflows via Voice: develop AI-powered voice automations that reduce manual effort in collections, reconciliation, and customer communication.
  • Collaborate Cross-Functionally: partner with Product, Engineering, Design, and customers to translate business needs into effective, user-centered voice AI solutions.
  • Measure & Communicate Impact: define success metrics and continuously improve AI systems based on real-world usage and customer feedback.

You Might Be a Fit If You…

  • Have 5+ years of software engineering experience, with 2+ years focused on applied AI/ML, speech, or audio systems in production.
  • Have built and shipped voice, audio, or conversational AI systems used by real customers.
  • Have experience with speech-to-text, text-to-speech, audio processing, or multimodal models.
  • Have integrated and fine-tuned LLMs for conversational or agent-based systems.
  • Understand LLMOps / MLOps best practices, including deployment pipelines, monitoring, evaluation, and A/B testing.
  • Are fluent in Python and experienced with PyTorch, TensorFlow, Transformers, or audio ML frameworks.
  • Have built real-time or low-latency systems and understand the tradeoffs involved.
  • Can translate business and UX requirements into robust, scalable AI solutions.
  • Have experience integrating AI systems into existing enterprise or SaaS platforms.
  • Enjoy working on ambiguous problems where product definition, UX, and engineering meet.

Compensation

Top-of-market salary and equity package

Benefits (for U.S.-based full-time employees)

  • Medical, dental & vision insurance coverage for you
  • 401(k) & Match
  • Equity
  • Flexible PTO
  • Parental Leave

Skills

PythonPyTorchTensorFlowTransformersSpeech-To-TextText-To-SpeechAudio ProcessingMultimodal ModelsLlmopsMLOps

Similar roles

ML Engineering jobs

Member of Technical Staff

Build AI agents that navigate digital environments and perform user tasks. Requires strong AI/ML experience, Python proficiency, and product intuition.

220k – 405kSan Francisco, CAML EngineeringOn-site5+ YOEGoCdp

Member of Technical Staff

ML Engineer building and optimizing production recommendation, ranking, and personalization systems that integrate LLMs for Perplexity's AI product.

220k – 405kSan Francisco, CA +1ML EngineeringOn-site5+ YOELLMsFeature Stores

Member of Technical Staff

Build and own multimodal AI product and platform systems across the stack at Perplexity. Requires production systems experience, full-stack capability, and strong product judgment.

220k – 405kSan Francisco, CAML EngineeringOn-site5+ YOEC++Rust

Staff Machine Learning Engineer, Voice AI

Staff ML Engineer to own the model serving stack for real-time voice inference (STT, TTS, speech-to-speech) on H100/H200 GPUs. Drive latency/throughput optimization using TRT-LLM and SGLang for models like Whisper and Parakeet.

220k – 280kSan Francisco, CAML EngineeringOn-site8+ YOEAsrTts

Senior Staff ML Engineer

Leads architecture of scalable ML platforms for generative AI across text, image, audio, and video. Drives company-level strategy, mentors engineers, and builds large-scale systems requiring 12+ years experience in ML infrastructure.

220k – 247kPalo Alto, CAML EngineeringHybrid12+ YOELLMsC/C++