Staff+ Software Engineer, Platform

405k – 485kSan Francisco, CANew York, NYSeattle, WAHybrid8+ YOEApr 8

Summary

Staff-level software engineer builds and scales platform infrastructure across teams, including dev tools, service infra, multicloud, auth, connectivity, API distributability, and ML adaptation systems. Requires 8+ years full-stack experience with Staff leadership, focusing on robust, scalable solutions in fast-paced AI environment.

About the role

What you'll do

Platform Acceleration

Architect and optimize critical development infrastructure including dev environments, observability, and CI/CD pipelines
Partner with product teams to understand workflows and eliminate friction points

Service Infra

Build and maintain core infrastructure: service mesh, observability systems, deployment pipelines, shared libraries
Enable product teams to build and operate reliable services at scale

Multicloud

Build infrastructure for multi-cloud providers: cloud-agnostic tooling, cross-cloud networking, multi-region deployments

Auth & Identity

Build scalable solutions for user authentication, authorization, RBAC, SSO
Work with product teams, security, support, trust & safety

Connectivity

Own MCP proxy, OAuth/token management, MCP spec, Python/TypeScript SDKs
Handle token refresh at scale, admin controls, proxy infrastructure

API Distributability

Transform Claude API into cloud-native managed product: cross-cloud, on-prem, enterprise security/compliance

Platform Intelligence

Build training systems for customer-specific Claude adaptation
Work on ML training infra, production ML pipelines

You might be a good fit if you

Have 8-10+ years of practical full-stack engineering experience, ideally 2+ years at Staff level
Led design/delivery of complex user-facing products across full stack
Technical expert in modern frontend/backend (e.g., React, TypeScript)
Product-focused: robust, scalable, easy-to-use solutions
Experience in fast-moving environments, building 0-to-1 products
Invest in peer mentorship/growth
Drive cross-team alignment, influence without authority
Established engineering standards, component architectures, best practices
Thrive in fast-paced, ambiguous environments

Strong candidates may also

Technical lead/architect for foundational platform systems
Designed/scaled billing/payments at high volumes
Containerization, secure execution environments
Identity/access management (auth, SSO, RBAC) at enterprise scale
ML/AI systems, LLM inference, model serving
Multi-cloud, cross-region architectures
API design focused on developer experience

Skills

ReactTypeScriptCI/CDKubernetesOAuthService MeshObservabilityMulti-cloudRBACSSOML TrainingLLM InferenceAPI GatewaysPythonTypeScript SDK

Similar roles at this salary range

All DevOps / SRE jobs →

Anthropic

May 20

Performance Engineer, Inference Systems

Performance engineer focused on cross-layer investigations of Anthropic's inference fleet for Claude, optimizing throughput, latency, reliability, and correctness while building observability and partnering with kernel and serving teams.

350k – 850kSan Francisco, CA +2DevOps / SREHybridSQLPython

OpenAI

May 16

Tech Lead, Deployment & Operations — Custom Infrastructure

Lead deployment and operations for OpenAI’s custom silicon and systems into data center environments. Drive hardware bring-up, validation, production deployment, and fleet reliability at scale while leading a technical team.

342k – 445kSan Francisco, CADevOps / SREHybridToolingAutomation

Anthropic

May 5

Staff+ Software Engineer, Developer Productivity

Leads technical strategy and builds scalable developer infrastructure including build systems, CI/CD pipelines, and tooling for large monorepo environments. Requires 3+ years leading complex projects, proficiency in Python/Rust/Go, and experience with container orchestration.

405k – 625kSan Francisco, CA +2DevOps / SREHybridGoNix

Thinking Machines Lab

May 4

Software Engineer, Developer Productivity, AI Tools

Builds and maintains AI-powered developer productivity tools, including coding agents, secure sandboxes, and standardized environments to accelerate internal software development workflows while ensuring security and quality.

350k – 475kSan Francisco, CADevOps / SREOn-siteuvTGI

Thinking Machines Lab

May 4

Site Reliability Engineer (SRE)

Site Reliability Engineer drives end-to-end reliability for AI fine-tuning platform Tinker, including SLOs, monitoring, incident response, and multi-tenant GPU scheduling. Requires distributed systems experience, software proficiency for reliability, and production incident handling.

350k – 475kSan Francisco, CADevOps / SREOn-siteSLOsCI/CD

Apply