Sr. Production Engineer, Solutions Engineering
Senior Production Engineer building AI agents, platforms, and automation to ensure reliability of Pinterest's large-scale distributed systems serving hundreds of millions of users.
What you’ll do
- Design and build AI agents that augment production reliability work — Develop agents that assist engineers with service health analysis, reliability recommendations, migration playbook generation, and risk identification
- Drive large-scale infrastructure modernization with AI-accelerated execution — Lead Kubernetes adoption and platform transitions using AI to generate automation
- Transform consulting patterns into scalable platforms — Execute scoped reliability engagements with engineering teams, then encode successful approaches into AI-assisted tools and automation
- Build the knowledge infrastructure that powers Pinterest's operational agent ecosystem — Create migration playbooks, operational runbooks, incident patterns, and best practices
- Develop software solutions to enable reliability and operability of large-scale distributed systems
- Build tools and automation to eliminate toil and reduce operational overhead
- Build meaningful, insightful and actionable SLIs — Develop service level indicators that provide clear signals of system health
- Automate critical portions of Pinterest's engineering processes
- Manage capacity and performance to help scale our infrastructure — Partner with teams to plan and optimize capacity across public and private clouds
What we’re looking for
- 5+ years of industry experience building and operating large-scale, high-performance distributed systems
- Bachelor's degree in Computer Science or related field, or equivalent experience
- Strong programming skills in Python or Go — ability to build production-grade platforms, agents, and automation
- Deep knowledge of Linux/Unix internals and experience with open source infrastructure (MySQL, Kafka, Envoy, Hadoop, etc.)
- Infrastructure as Code experience (Terraform, Puppet, Chef, Ansible, Docker, Kubernetes)
- Experience deploying web applications to cloud infrastructure (AWS, GCP, or Azure) and working with distributed, service-oriented architecture
Preferred
- Experience developing AI agents for infrastructure automation, operational decision-making, or reliability workflows
- AI/ML infrastructure experience (LLM-based systems, model serving, agentic workflows)
- Technical consulting or embedded SRE experience with cross-functional engineering teams
Senior Data Engineer, Sentinel (Pacific Time Zone)
Senior Infrastructure Engineer building and operating AWS cloud infrastructure for healthcare data platform. Requires Python, Terraform, CI/CD expertise, and big data tools experience.
Software Engineer, Infrastructure
Build and operate foundational data infrastructure including Airflow, Flink, DynamoDB, and RDS using Terraform and Kubernetes. Requires 2-4 years of infrastructure/platform experience and strong Python skills.
Software Engineer, Developer Experience
Build internal AI tools and autonomous agents that embed into Retool's engineering workflows to boost developer productivity and reduce toil. Requires shipping real AI-powered developer tools and infrastructure.
Senior Asset Pipeline Engineer
Design and own the OpenUSD-based asset pipeline for a high-fidelity sensor simulation platform. Build automated DCC-to-engine pipelines, custom schemas, material conversion, and validation systems at library scale.