Technical Program Manager, Compute

Drives planning, coordination, and execution of compute infrastructure programs at scale, owning workstreams from procurement to allocation. Partners cross-functionally with engineering, research, and finance teams in a fast-paced AI environment.

365k – 435kSan Francisco, CANew York, NYSeattle, WATechnical Program ManagementHybrid7+ YOE

Apply

About the role

Responsibilities:

Own and drive critical programs across the compute lifecycle, coordinating execution across multiple engineering, research, and operations teams
Build and maintain operational visibility into the compute fleet, ensuring the organization has a clear picture of supply, demand, utilization, and health
Lead cross-functional coordination for compute transitions: bringing new capacity online, migrating workloads, and managing decommissions across cloud providers and hardware platforms
Partner with engineering and research leadership to navigate competing priorities and drive alignment on how compute resources are planned, allocated, and used
Identify and close operational gaps across the compute pipeline, whether through new tooling, improved processes, or better cross-team communication
Own trade-off discussions between utilization, cost, latency, and reliability, synthesizing inputs from technical and business stakeholders and communicating decisions to leadership
Develop and improve the processes and frameworks the team uses to plan, track, and execute compute programs at increasing scale and complexity

You may be a good fit if you:

Have 7+ years of technical program management experience in infrastructure, platform engineering, or compute-intensive environments
Have led complex, cross-functional programs involving multiple engineering teams with competing priorities and ambiguous requirements
Have experience working with research or ML teams and translating their needs into operational plans and technical requirements
Are comfortable diving deep into technical details (cloud infrastructure, cluster management, job scheduling, resource orchestration) while maintaining program-level visibility
Thrive in ambiguous, fast-moving environments where you need to define scope and build processes from the ground up
Have strong communication skills and can engage credibly with engineers, researchers, finance, and executive leadership
Have a track record of building trust with engineering teams and driving changes through influence rather than authority

Strong candidates may also have:

Experience managing compute capacity across multiple cloud providers (AWS, GCP, Azure) or hybrid cloud/on-premise environments
Familiarity with job scheduling, resource orchestration, or workload management systems (Kubernetes, Slurm, Borg, YARN, or custom schedulers)
Experience with GPU or accelerator infrastructure, including the unique challenges of large-scale ML training and inference workloads
Built or improved observability for infrastructure systems: dashboards, alerting, efficiency metrics, or cost attribution
Capacity planning experience including demand forecasting, cost modeling, or hardware lifecycle management
Scaled through hypergrowth in AI/ML, HPC, or large-scale cloud environments

Annual Salary: $365,000 — $435,000 USD

Skills

KubernetesAWSGCPAzureSlurmBorgYarnGPUCluster ManagementJob Scheduling

Similar roles

Technical Program Management jobs

Anthropic

Technical Program Manager, API Platform

Drive scaling and efficiency programs for Anthropic's API stack, coordinating across infrastructure, networking, and inference teams to deliver complex, cross-functional initiatives. Requires 7+ years of TPM experience with API platforms and distributed systems.

365k – 435kSan Francisco, CA +1Technical Program ManagementHybrid7+ YOENetworkingApi Platforms

Anthropic

Technical Program Manager, Data Center Infrastructure

Drive cross-functional execution of Anthropic's data center programs from construction through commissioning. Manage multi-site infrastructure programs, lead TPM teams, and coordinate with contractors and internal engineering teams.

365k – 435kSan Francisco, CA +2Technical Program ManagementHybrid7+ YOERisk ManagementSchedule Tracking

OpenAI

Token-as-a-Service Technical Program Manager

Leads end-to-end delivery of external compute capacity into production-ready tokens for OpenAI model workloads. Drives cross-functional programs across engineering, partners, and operations, requiring 8+ years TPM experience and strong infrastructure knowledge.

342k – 555kSan Francisco, CA +1Technical Program ManagementHybrid8+ YOEHardwareNetworking

Anthropic

Technical Program Manager, Marketing Technology

Leads Marketing Mix Modeling, incrementality testing, brand measurement, and marketing data infrastructure programs. Coordinates cross-functional teams including Data Science, Engineering, and vendors to deliver measurement capabilities and automation solutions. Requires 7+ years TPM experience with marketing analytics expertise.

290k – 365kSan Francisco, CA +1Technical Program ManagementHybrid7+ YOEMmmCdp

OpenAI

Device Safety & Risk Operations Specialist

Build and operate the end-to-end safety and risk model for new consumer hardware, defining workflows, playbooks, and tooling from launch through scaled operations. Requires 8+ years in device or product safety with strong technical fluency and operational judgment.

252k – 335kSan Francisco, CATechnical Program ManagementHybrid8+ YOEAI ToolsTelemetry