Machine Learning Engineer
Lead projects building and deploying large-scale ASR/NLP/LLM systems for meeting intelligence. Architect training, fine-tuning, and inference pipelines using PyTorch/JAX and own ML systems from research to production.
Your Impact
- Architect, build, and evolve large-scale SID / ASR / NLP / LLM systems that power mission-critical product experiences including summarization, chat, and speech understanding across millions of conversations.
- Lead the design and implementation of training, fine-tuning, post-training, and inference strategies for large language and speech models using PyTorch and/or JAX, making principled trade-offs across quality, latency, cost, and reliability.
- Design and improve model architectures, loss functions, decoding strategies, and training techniques for speech and language models, informed by both research and production constraints.
- Own end-to-end ML system lifecycles, from research prototyping through production deployment, monitoring, iteration, and long-term maintenance.
- Partner deeply with product, and infrastructure teams to develop and translate cutting-edge research into scalable, production-grade systems that deliver measurable user and business impact.
- Drive system-level improvements in model performance, robustness, observability, and operational excellence using real-world conversational data at scale.
- Set technical direction and best practices for ML infrastructure, data pipelines, evaluation frameworks, and deployment workflows in a cloud environment.
- Identify and resolve complex, ambiguous problems in model behavior, data quality, scaling, and system interactions, often before they surface as user-visible issues.
- Mentor and elevate other engineers, influencing team standards, reviewing designs, and contributing to a culture of strong technical decision-making and execution.
We're Looking for Someone Who
- Holds a Bachelor’s or Master’s degree in Computer Science or a related field with 3+ years of relevant industry experience; PhD is preferred.
- Has deep, hands-on experience building, fine-tuning, and post-training large language models or other foundation models, including an understanding of failure modes and trade-offs.
- Demonstrates strong command of modern ML research, with the ability to critically evaluate new papers and decide what is production-worthy versus experimental.
- Has interest in creating innovation and advancing applied research.
- Has extensive experience deploying, monitoring, and operating ML systems in production, including model versioning, rollback strategies, and performance regression detection.
- Is comfortable working with large-scale speech and conversational datasets, including data preprocessing, augmentation, quality analysis, and labeling strategies to support model training and evaluation.
- Has experience scaling ML systems across training, inference, and serving infrastructure while balancing cost, latency, and reliability constraints.
- Is highly effective at cross-functional collaboration, working end-to-end with product, infra, research, and data teams to deliver outcomes—not just models.
- Can lead technical projects independently, driving clarity in ambiguous problem spaces and making sound architectural decisions.
- Has experience with or strong interest in agentic systems, tool-use frameworks, or multi-model orchestration.
- Has significant experience with at least one of the following areas: (1) Speech recognition (ASR), (2) Text-to-speech (TTS), (3) Multimodal (speech/text) foundation models, or (4) modern LLM NLP tasks (e.g., summarization, dialogue, speech understanding), especially in real-world production settings.
- Experience with personalization, recommendation systems, or user modeling is a plus.
Staff Machine Learning Engineer
Staff ML Engineer leading end-to-end identity verification ML systems including document authenticity, face matching, liveness detection, GNN-based identity graphs, and behavioral risk models. Requires 8+ years production ML experience and domain expertise in biometrics or fraud detection.
Staff ML Engineer
Founding Staff ML Engineer building production ML systems for governance, security, and agentic platform capabilities at Docker. Owns architecture, data pipelines, evaluation, and model lifecycle while mentoring the growing team.
Senior Research Engineer, Post-training & Evaluation
Own evaluation science and post-training methodology for Reddit's foundational LLMs. Define benchmarks, design model-as-a-judge systems, and set SFT recipes that turn base models into safe, Reddit-native endpoints.