Staff+ Software Engineer, Platform
Staff-level software engineer builds and scales platform infrastructure across teams, including dev tools, service infra, multicloud, auth, connectivity, API distributability, and ML adaptation systems. Requires 8+ years full-stack experience with Staff leadership, focusing on robust, scalable solutions in fast-paced AI environment.
What you'll do
Platform Acceleration
- Architect and optimize critical development infrastructure including dev environments, observability, and CI/CD pipelines
- Partner with product teams to understand workflows and eliminate friction points
Service Infra
- Build and maintain core infrastructure: service mesh, observability systems, deployment pipelines, shared libraries
- Enable product teams to build and operate reliable services at scale
Multicloud
- Build infrastructure for multi-cloud providers: cloud-agnostic tooling, cross-cloud networking, multi-region deployments
Auth & Identity
- Build scalable solutions for user authentication, authorization, RBAC, SSO
- Work with product teams, security, support, trust & safety
Connectivity
- Own MCP proxy, OAuth/token management, MCP spec, Python/TypeScript SDKs
- Handle token refresh at scale, admin controls, proxy infrastructure
API Distributability
- Transform Claude API into cloud-native managed product: cross-cloud, on-prem, enterprise security/compliance
Platform Intelligence
- Build training systems for customer-specific Claude adaptation
- Work on ML training infra, production ML pipelines
You might be a good fit if you
- Have 8-10+ years of practical full-stack engineering experience, ideally 2+ years at Staff level
- Led design/delivery of complex user-facing products across full stack
- Technical expert in modern frontend/backend (e.g., React, TypeScript)
- Product-focused: robust, scalable, easy-to-use solutions
- Experience in fast-moving environments, building 0-to-1 products
- Invest in peer mentorship/growth
- Drive cross-team alignment, influence without authority
- Established engineering standards, component architectures, best practices
- Thrive in fast-paced, ambiguous environments
Strong candidates may also
- Technical lead/architect for foundational platform systems
- Designed/scaled billing/payments at high volumes
- Containerization, secure execution environments
- Identity/access management (auth, SSO, RBAC) at enterprise scale
- ML/AI systems, LLM inference, model serving
- Multi-cloud, cross-region architectures
- API design focused on developer experience
Performance Engineer, Inference Systems
Performance engineer focused on cross-layer investigations of Anthropic's inference fleet for Claude, optimizing throughput, latency, reliability, and correctness while building observability and partnering with kernel and serving teams.
Tech Lead, Deployment & Operations — Custom Infrastructure
Lead deployment and operations for OpenAI’s custom silicon and systems into data center environments. Drive hardware bring-up, validation, production deployment, and fleet reliability at scale while leading a technical team.
Staff+ Software Engineer, Developer Productivity
Leads technical strategy and builds scalable developer infrastructure including build systems, CI/CD pipelines, and tooling for large monorepo environments. Requires 3+ years leading complex projects, proficiency in Python/Rust/Go, and experience with container orchestration.
Software Engineer, Developer Productivity, AI Tools
Builds and maintains AI-powered developer productivity tools, including coding agents, secure sandboxes, and standardized environments to accelerate internal software development workflows while ensuring security and quality.
Site Reliability Engineer (SRE)
Site Reliability Engineer drives end-to-end reliability for AI fine-tuning platform Tinker, including SLOs, monitoring, incident response, and multi-tenant GPU scheduling. Requires distributed systems experience, software proficiency for reliability, and production incident handling.