Staff Machine Learning Systems Engineer, Embeddings Platform
Staff ML Systems Engineer leading large-scale embedding and recommendation model architecture, distributed training, and real-time serving. Owns ML strategy and mentors engineers on personalization systems.
What You’ll Do
- Architect and lead the development of next-generation, large-scale machine learning techniques.
- Define and execute the ML strategy, identifying opportunities to enhance personalization and recommendation quality across Reddit.
- Lead research initiatives on scalable machine learning systems and real-time model adaptation, bringing cutting-edge advancements into production.
- Partner with ML infrastructure teams to build high-performance, distributed training systems that efficiently scale across multiple GPUs and cloud environments.
- Establish and optimize real-time serving architectures for large-scale embeddings, ensuring low-latency inference and high throughput.
- Collaborate cross-functionally with teams in Feed Ranking, Ads, Content Understanding, and Core ML to integrate ML models into Reddit’s key AI-driven systems.
- Mentor and guide senior and mid-level ML engineers, fostering a culture of excellence, innovation, and knowledge sharing.
- Stay at the forefront of AI research, evaluating and introducing new modeling paradigms to keep Reddit’s ML ecosystem cutting-edge.
- Drive technical discussions, present findings to leadership, and contribute to long-term ML planning and decision-making.
Who You Might Be
- 8+ years of experience in machine learning engineering, with a strong focus on large-scale ML systems and recommendation or personalization systems.
- Expertise in modern deep learning architectures, including sequence models and foundational models.
- Deep understanding of complex multi-entity relationships in machine learning applications and how they are modeled in large-scale systems.
- Proven ability to design, implement, and optimize scalable ML architectures, from distributed training to real-time inference.
- Strong software engineering skills in Python, C++, or similar languages, with experience in ML infrastructure, high-performance computing, and cloud-based ML pipelines.
- Demonstrated leadership in driving ML strategy, mentoring engineers, and influencing cross-functional teams.
- Experience with A/B testing, model evaluation frameworks, and real-time feedback loops in large-scale production systems.
- Excellent communication skills, with the ability to effectively present complex ML concepts to technical and non-technical stakeholders.
Benefits
- Comprehensive Healthcare Benefits and Income Replacement Programs
- 401k with Employer Match
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave
Staff Machine Learning Engineer
Staff ML Engineer leading end-to-end identity verification ML systems including document authenticity, face matching, liveness detection, GNN-based identity graphs, and behavioral risk models. Requires 8+ years production ML experience and domain expertise in biometrics or fraud detection.
Staff ML Engineer
Founding Staff ML Engineer building production ML systems for governance, security, and agentic platform capabilities at Docker. Owns architecture, data pipelines, evaluation, and model lifecycle while mentoring the growing team.
Principal Engineer, AI Platform
Principal Engineer setting technical vision and building AI/ML infrastructure for Generative AI and Recommender Systems at Pinterest, scaling to hundreds of millions of inferences per second. Requires deep expertise in distributed systems and proven cross-org technical leadership.
Senior Research Engineer, Post-training & Evaluation
Own evaluation science and post-training methodology for Reddit's foundational LLMs. Define benchmarks, design model-as-a-judge systems, and set SFT recipes that turn base models into safe, Reddit-native endpoints.