Senior Software Engineer, GenAI Platform
Leads development of Reddit's large-scale GenAI Platform, including LLM Gateway, RAG applications, and agentic AI workflows. Requires 5+ years in ML/AI platform engineering with expertise in cloud, Kubernetes, and MLOps practices.
What You’ll Do
- Contribute to the design, implementation, and maintenance of the LLM Gateway, focusing on features like unified API endpoints for internal/externally hosted LLM, rate/token limit management, and intelligent failover mechanisms to boost uptime and reliability.
- Design and develop ML and Generative AI systems in cloud-based production environments at scale.
- Build and manage enterprise-grade RAG applications using embeddings, vector search, and retrieval pipelines.
- Implement and operationalize agentic AI workflows with tool use using frameworks such as LangChain and LangGraph.
- Drive adoption of MLOps / LLMOps practices, including CI/CD automation, versioning, testing, and lifecycle management.
- Establish best practices for observability, monitoring, evaluation, and governance of GenAI pipelines in production.
- Strong ownership mindset and platform thinking.
- Ability to lead AI platform delivery from concept to production.
Who You Might Be
- 5+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
- Experience operating orchestration systems such as Kubernetes at scale.
- Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more.
- Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
- Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders.
- Strong focus on scalability, reliability, performance, and ease of use.
- Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus.
- Strong proficiency in Python and experience with modern AI/ML frameworks (e.g. LangChain, Vertex AI Agent Builder, TensorFlow, PyTorch) is a plus.
Staff Software Engineer, AI Runtime
Staff Software Engineer building and scaling Databricks' managed large-scale GPU training platform (AIR). Focus on distributed training performance, scheduling, fault tolerance, and developer experience for thousands of accelerators.
Senior Software Engineer, AI Runtime
Senior Software Engineer building and scaling Databricks' managed GPU training platform (AI Runtime) for large-scale distributed AI model training. Requires 5+ years in distributed systems and hands-on experience with GPU training frameworks.
Sr. Machine Learning Engineer, Computer Vision
Build and prototype diffusion-based text-to-image generative models (Pinterest Canvas) using large-scale visual-text datasets. Requires 5+ years industry computer vision experience and an M.S. or Ph.D.
Senior AI/ML Engineer
Senior AI/ML Engineer building transformer and deep learning models on financial and behavioral data to power personalized growth and marketing experiences at Chime. Requires strong production ML experience with PyTorch, AWS, and large-scale data infrastructure.