Senior Engineering Manager, Compute
Lead and grow a high-ownership engineering team building and operating large-scale, multi-tenant compute platforms for frontier AI workloads. Own strategy, architecture, operations, capacity planning, and live-site reliability for next-generation compute infrastructure.
Responsibilities
- Own the strategy and standards of excellence for the compute layer that the world's agents run on, across design, delivery, and operations. Build a culture of ownership, quality, and customer-first decision-making.
- Lead, hire, and grow a high-ownership team; roll up sleeves, ready to do deep into the trenches, by staying close to design docs and code, rather than managing from a distance. Coach engineers, level them up, and clear the friction that slows them down.
- Drive the arc from today's compute toward the next-generation of compute platforms. Ground prioritization in customer and design-partner feedback, and turn ambiguous, fast-moving requirements into predictable, iterative delivery.
- Own operations, run on-call and incident response, and drive blameless postmortems and the systemic fixes that prevent recurrence.
- Guide the hard architectural decisions for large-scale, multi-tenant compute, where technical concerns cut across workload isolation and security, scheduling, fleet efficiency / utilization / goodput, and performance, while ensuring the platform is reliable and efficient for the workloads that depend on it.
- Own utilization, capacity and supply planning, and the cost-per-unit-of-compute and margin profile of the fleet, across CPU compute today and accelerated compute ahead.
- Partner with leadership, Product, SDK, UX/DX, Security, and design-partner customers to align priorities and unblock delivery. Communicate progress, tradeoffs, and risk clearly to technical and non-technical audiences alike.
Requirements
- Proven experience leading software engineering teams that build and operate large-scale compute platforms or fleets, with strong operational practices.
- 12+ years in software and/or infrastructure engineering, including 7+ years of people management and demonstrated ownership of delivery and live-site outcomes.
- Deep distributed-systems and compute infrastructure depth, with the hands-on judgment to guide architecture and execution rather than from a distance.
- Experience operating multi-tenant compute that other people's production workloads depend on.
- Bachelor's degree in Computer Science or related field, or equivalent practical experience; advanced degree a plus.
- Excellent communication skills, with the ability to partner across engineering, product, and leadership and fold customer feedback into the roadmap.
- Strong leadership, coaching, and performance management; ability to grow engineers and build a healthy, accountable, high-ownership team.
- Excellence in execution: planning, prioritization, and delivering iterative milestones in an ambiguous, fast-moving environment while managing unplanned work.
- Fleet thinking: utilization, goodput, capacity and supply planning, and cost discipline as first-class engineering concerns.
- Live-site reliability craft: on-call, incident management & response, and postmortem-driven continuous improvement.
- Strong command of the building blocks of a compute platform: multi-tenant isolation and security, scheduling, and resource management.
- Ability to review and raise the bar on technical artifacts (design docs, code reviews) across a distributed-systems codebase.
Nice-to-Haves
- MicroVMs and virtualization (Firecracker, gVisor, Edera) or managed-compute primitives (AWS Fargate, GCP Cloud Run, AWS Lambda), and/or Kubernetes internals.
- Building serverless or hosted-compute products from 0 to 1, including the rapid-delivery-vs-durable-platform tradeoffs that come with it.
- Multi-cloud delivery across AWS and GCP.
- Cold-start, warm-pool, and scheduling/latency optimization for on-demand compute.
- Agent sandboxes, secure execution of untrusted code, or other AI-agent infrastructure.
- GPU / accelerated compute: fractional GPUs (MIG, MPS, time-slicing), GPU scheduling, training vs. inference fleets, and multi-tenant GPU isolation.
Compensation & Benefits
- Estimated pay range: $320,000 - $335,000.
- Eligible to participate in Temporal's equity plan.
- Unlimited PTO, 12 Holidays + 2 Floating Holidays.
- 100% Premiums Coverage for Medical, Dental, and Vision.
- AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available).
- Empower 401K Plan.
- Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more!
Staff Software Engineer
Lead technical strategy and execution for a healthcare tech platform that analyzes billions of medical records. Build and optimize systems using AI coding agents while mentoring engineers and ensuring HIPAA compliance.
Manager, Software Engineering
Lead the engineering team for Payload, Figma’s open-source CMS and application framework. Drive technical strategy, community engagement, and team growth while partnering across product and design.
Sr. Director, Agentic Engineering
Lead Dialpad's Agentic Engineering organization of ~130 engineers, owning vision, architecture, and execution of multi-agent AI orchestration platform for real-time conversational reasoning and tool execution.
Platform Engineering Manager, Cloud Infrastructure
Lead the Cloud Infrastructure Squad, owning AWS platform reliability, scalability, and security. Manage a team of infrastructure engineers and drive the technical roadmap for Kubernetes, Terraform, and related tooling.
Manager, Software Engineering - Storage Platform
Lead the Databases team responsible for Figma's sharded Postgres infrastructure, reliability, and operational excellence. Manage engineers, set technical direction, and partner with product teams to scale critical stateful systems.