# Sr. Staff Technical Program Manager - Reliability
**Company:** [Databricks](https://hotfix.jobs/companies/databricks)
**Location:** Bellevue, WA, Seattle, WA
**Salary:** $180K-$243K
**Experience:** 10+ years
**Skills:** AWS, Azure, GCP, Distributed Systems, SRE, Jira, SLOs, Error Budgets, Chaos Engineering, Container Orchestration
**Posted:** 2026-03-02
> Leads reliability strategy and execution for Databricks' multi-cloud infrastructure, partnering with senior engineering leaders to drive roadmaps, programs, and best practices. Requires 10+ years in cloud/SRE with hyperscale experience.
## Job Description
## Responsibilities

### Lead Reliability Strategy + Multi-Quarter Roadmaps
- Partner with senior engineering leadership to define long-term Reliability roadmap and ensure alignment across Platform Engineering, Compute Fleet Management, SRE, Security, and Cloud Partnerships.

### Drive Execution of Critical Reliability Programs
- Own end-to-end program execution: planning, risk management, dependency mapping, trade-off decisions, status reporting, and delivery.
- Identify process/architecture gaps and drive improvements with Tech Leads.

### Partner Deeply with Engineering & Influence Technical Direction
- Leverage infrastructure/SRE background to guide design and prioritization.
- Facilitate cross-functional alignment and apply systems thinking to improve scalability, fault tolerance, automation, and tooling.

### Elevate Reliability Culture
- Drive adoption of best practices: error budgets, incident reviews, design-for-resilience, operational readiness.
- Implement governance, processes, metrics, and documentation.

## Required Experience & Qualifications
- 10+ years managing large-scale technical programs in cloud infrastructure, distributed systems, SRE, or platform engineering.
- Experience with 2+ hyperscale clouds (AWS, Azure, GCP), multi-AZ/region architecture.
- Success leading Reliability Programs (availability, failover, incident reduction).
- Strong understanding of infrastructure/distributed systems/SRE; engineering/SRE experience preferred.
- Partnering with senior leadership on strategy and multi-team initiatives.
- Translate ambiguous goals into plans with milestones/KPIs.
- Manage cross-org dependencies, risks, multi-quarter timelines.
- Delivering programs across multiple clouds/cloud-native services.
- Building/scaling engineering processes and frameworks.

## Preferred Qualifications
- Background in distributed systems engineering, SRE, platform infrastructure, or cloud services.
- Experience with compute fleets, container orchestration, autoscaling, control-plane.
- Familiarity with SLOs, error budgets, chaos engineering, incident management.
- Expertise with Jira or equivalent.
- Bachelor’s in CS/Engineering or related; advanced degree preferred.
**Apply:** https://hotfix.jobs/jobs/sr-staff-technical-program-manager-reliability-at-databricks-b52a7025-62fa-48b4-9ee1-b0dcda555ffc
**Canonical:** https://hotfix.jobs/jobs/sr-staff-technical-program-manager-reliability-at-databricks-b52a7025-62fa-48b4-9ee1-b0dcda555ffc