Senior Manager, Infrastructure Platform Engineering

Lead a team building core infrastructure platform services for large-scale compute capacity allocation, state management, and security. Requires 10+ years infrastructure experience and 3+ years in engineering leadership.

245k – 295kSan Francisco, CASunnyvale, CAEngineering ManagementOnsite10+ YOE

Apply

About the role

What You'll Be Working On

Leading the team responsible for the platform services that abstract underlying infrastructure into reliable, allocatable capacity, and for the systems that track and reconcile state across a large fleet
Setting the technical roadmap across capacity and utilization intelligence, resource lifecycle and state management, and platform security and trust frameworks
Driving the design of secure, well-instrumented platform systems — from Kubernetes-based orchestration and automation to lower-level system and hardware integration
Hiring, mentoring, and growing a team of infrastructure software engineers; building a high-performing organization from a strong foundation
Partnering with infrastructure, production engineering, and security teams to align platform capabilities with operational reliability, capacity, and trust requirements
Improving platform efficiency and availability — characterizing bottlenecks, reducing stranded resources, and shortening operational and recovery cycles
Establishing engineering standards for infrastructure software development: code quality, testing, deployment safety, and on-call practices for systems that span the platform
Translating a vertically integrated infrastructure stack into reliable platform primitives that engineering teams can build on
Staying technically hands-on — reviewing designs, contributing to architecture decisions, and being credible to the engineers you lead

What You'll Bring to the Team

10+ years of experience in infrastructure or systems software development, with at least 3+ years in an engineering leadership role
Deep expertise in large-scale infrastructure platforms — building services that pool, allocate, and reconcile compute resources at scale
Strong background with Kubernetes and cloud platforms (GCP, AWS, or Azure) — orchestration, automation, and operating distributed systems in production
Experience with distributed state management and control systems — modeling resource and system lifecycle, reconciling desired vs. actual state, and handling failure gracefully across a large fleet
Experience with efficiency, capacity, or performance engineering — characterizing system behavior, identifying bottlenecks, and driving measurable improvements in utilization or availability
A player-coach approach to management: hands-on enough to make technical calls, structured enough to grow a team and ship through them
Track record of hiring strong infrastructure engineers and helping them grow into more senior roles
Comfortable operating in a fast-moving environment where the path isn't fully paved — willing to drive ambiguity to clarity

Bonus Points

Experience operating Kubernetes on bare-metal infrastructure as well as on managed cloud services (GKE, EKS, AKS)
Familiarity with the operational challenges of GPU clusters, AI training, and inference workloads
Working knowledge of platform security and trust concepts — secure boot, measured boot, TPMs, and hardware attestation
Experience with capacity forecasting, demand modeling, or allocation optimization at scale
Hands-on background with telemetry and observability platforms at scale (Prometheus, OpenTelemetry, Grafana)
Prior experience building infrastructure platforms at hyperscalers or cloud providers where internal engineers are the primary customer
Familiarity with hardware-software co-design — understanding how platform choices affect physical infrastructure utilization

Benefits

Competitive compensation and equity packages
Restricted Stock Units
Paid time off, paid holidays & leave of absence programs
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off
Global travel insurance & emergency assistance
Daily meals allowance
Additional perks & programs specific to location

Skills

KubernetesGCPAWSAzureInfrastructureSystems SoftwareDistributed SystemsCapacity PlanningPerformance EngineeringPlatform SecurityObservabilityPrometheusOpenTelemetryGrafanaBare Metal

Similar roles

Engineering Management jobs

Crusoe

Senior Engineering Manager, Managed Platform Services

Lead the Command Center Insights & Actions team building observability, alerting, and automated remediation systems for Crusoe's AI cloud infrastructure. Own roadmap, mentor engineers, and drive technical excellence in a high-scale environment.

245k – 295kSan Francisco, CA +1Engineering ManagementOn-site7+ YOETelemetryHeuristics

Sr. Manager, Engineering, Ad Formats

This role is for a Senior Manager of Engineering to lead the Ads Format team, focusing on building the next generation server-driven UI framework for Ads Creation and Personalization. The role involves managing a team of 15 engineers and making key architectural decisions.

242k – 430kSan Francisco, CA +1Engineering ManagementHybrid9+ YOEAIStatistics

Discord

Engineering Manager, Data Platform

Engineering Manager leading teams to build scalable data infrastructure processing petabytes of data for Discord's gaming platform. Requires 7+ years software engineering experience in distributed systems/data infra, 2+ years leadership, and data tools expertise.

248k – 279kSan Francisco, CAEngineering ManagementOn-site7+ YOESQLAWS

Wispr Flow

Engineering Manager, Enterprise

Lead enterprise engineering end-to-end, scaling revenue 10x while building security/compliance features and hiring/coaching a world-class team. Requires proven experience transitioning SMB teams to enterprise and strong engineering judgment.

250k – 375kSan Francisco, CAEngineering ManagementOn-site7+ YOESSOSAML

Mercor

Engineering Manager, Core Engineering

Lead engineering for fraud detection, identity verification, and compliance systems at a Series C AI infrastructure company. Build AI-powered risk and decision systems while managing a high-performing backend team.

250k – 400kSan Francisco, CA +1Engineering ManagementOn-site8+ YOEAi SystemsRisk Scoring