Machine Learning Infrastructure Engineer
Designs, builds, and maintains ML training and serving infrastructure, providing support to research teams. Requires 4+ years in ML infrastructure, cloud platforms like Kubernetes and Google Cloud, and GPU experience.
Responsibilities
- Provide infrastructure support to our ML research and product
- Build tooling to diagnose cluster issues and hardware failures
- Monitor deployments, manage experiments, and generally support our research
- Maximize GPU allocation and utilization for both serving and training
Requirements
- 4+ years of experience supporting the infrastructure within an ML environment
- Experience in developing tools used to diagnose ML infrastructure problems and failures
- Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)
- Experience working with GPUs
Nice to Have
- Experience with large GPU clusters and high-performance computing/networking
- Experience with supporting large language model training
- Experience with ML frameworks like Pytorch/TensorFlow/JAX
- Experience with GPU kernel development
Machine Learning Engineer - Simulation Framework
Machine Learning Engineer focused on GPU-based simulation frameworks, reinforcement learning, and bridging sim-to-real gaps for autonomous vehicle safety validation. Requires MS/PhD and strong C++/Python experience.
Senior AI Engineer
Build full-stack AI systems including agentic workflows, RAG pipelines, and production infrastructure for mental healthcare applications. Requires 2+ years software engineering experience and 1+ year with LLMs or agentic AI.
Staff AI Engineer
Staff AI Engineer building and shipping LLM/agent-powered observability features for incident detection, triage, and resolution. Requires strong production software engineering experience plus practical GenAI/LLM application skills.
Senior AI Engineer
Build and ship AI-powered observability features using LLMs and agent workflows to help users detect, triage, and resolve incidents. Requires strong production software engineering experience plus practical GenAI application skills.
Staff Software Engineer, Trends Machine Learning Infrastructure
Lead technical direction for Pinterest's unified AI-powered Trends and Audience Insights platform. Architect scalable ML data pipelines and LLM capabilities while mentoring engineers and driving cross-team integrations.