Senior Software Engineer, GenAI Platform

191k – 267kUnited StatesRemote5+ YOEMar 31

Summary

Leads development of Reddit's large-scale GenAI Platform, including LLM Gateway, RAG applications, and agentic AI workflows. Requires 5+ years in ML/AI platform engineering with expertise in cloud, Kubernetes, and MLOps practices.

About the role

What You’ll Do

Contribute to the design, implementation, and maintenance of the LLM Gateway, focusing on features like unified API endpoints for internal/externally hosted LLM, rate/token limit management, and intelligent failover mechanisms to boost uptime and reliability.
Design and develop ML and Generative AI systems in cloud-based production environments at scale.
Build and manage enterprise-grade RAG applications using embeddings, vector search, and retrieval pipelines.
Implement and operationalize agentic AI workflows with tool use using frameworks such as LangChain and LangGraph.
Drive adoption of MLOps / LLMOps practices, including CI/CD automation, versioning, testing, and lifecycle management.
Establish best practices for observability, monitoring, evaluation, and governance of GenAI pipelines in production.
Strong ownership mindset and platform thinking.
Ability to lead AI platform delivery from concept to production.

Who You Might Be

5+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
Experience operating orchestration systems such as Kubernetes at scale.
Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more.
Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders.
Strong focus on scalability, reliability, performance, and ease of use.
Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus.
Strong proficiency in Python and experience with modern AI/ML frameworks (e.g. LangChain, Vertex AI Agent Builder, TensorFlow, PyTorch) is a plus.

Skills

PythonGoKubernetesAWSTerraformGoogle CloudLangChainLangGraphTensorFlowPyTorch

Similar roles at this salary range

All ML Engineering jobs →

Databricks

Jun 8

Staff Software Engineer, AI Runtime

Staff Software Engineer building and scaling Databricks' managed large-scale GPU training platform (AIR). Focus on distributed training performance, scheduling, fault tolerance, and developer experience for thousands of accelerators.

190k – 265kMountain View, CA +1ML EngineeringOn-siteFSDPRoCE

Databricks

Jun 8

Senior Software Engineer, AI Runtime

Senior Software Engineer building and scaling Databricks' managed GPU training platform (AI Runtime) for large-scale distributed AI model training. Requires 5+ years in distributed systems and hands-on experience with GPU training frameworks.

160k – 225kMountain View, CA +1ML EngineeringOn-siteFSDPRoCE

Jun 8

Sr. Machine Learning Engineer, Computer Vision

Build and prototype diffusion-based text-to-image generative models (Pinterest Canvas) using large-scale visual-text datasets. Requires 5+ years industry computer vision experience and an M.S. or Ph.D.

161k – 332kSan Francisco, CAML EngineeringRemoteRLHFPyTorch

Checkr

Jun 8

Machine Learning Engineer

Build and ship production ML/AI services powering background checks. Own end-to-end ML systems using LLMs, Python, and modern MLOps practices.

168k – 198kSan Francisco, CAML EngineeringOn-siteNLPdbt

Chime

Jun 8

Senior AI/ML Engineer

Senior AI/ML Engineer building transformer and deep learning models on financial and behavioral data to power personalized growth and marketing experiences at Chime. Requires strong production ML experience with PyTorch, AWS, and large-scale data infrastructure.

172k – 238kChicago, IL +3ML EngineeringHybridSQLAWS

Apply