# Director of SRE (FTE)
**Company:** [IntusCare](https://hotfix.jobs/companies/intuscare)
**Location:** Remote
**Salary:** $175K-$200K
**Experience:** 12+ years
**Skills:** Kubernetes, Microsoft Azure, Grafana, Prometheus, GitHub Actions, Argo CD, Terraform, Datadog, Splunk, Python
**Posted:** 2026-04-29
> Leads SRE strategy, reliability, observability, incident management, and QA for cloud-native healthcare EMR platform. Oversees Azure AKS, Kubernetes, CI/CD, and vendor teams; requires 12+ years SRE experience with 5+ years leadership.
## Job Description
## Key Responsibilities
- Own and execute the SRE strategy and multi-quarter roadmap across reliability, observability, incident management, QA maturity, and release engineering.
- Define, measure, and continuously improve SLAs, SLOs, error budgets, uptime, performance, and operational health metrics across all products and services.
- Lead production reliability for the full platform, including monitoring, alerting, on-call operations, incident response, root cause analysis, and MTTR reduction.
- Establish release readiness standards, deployment safety controls, and quality gates to ensure stable and predictable product releases.
- Manage external SRE vendors and partners, including service delivery, SLA governance, escalations, performance reviews, and compliance expectations.
- Lead QA engineering strategy with a focus on automation, regression prevention, test coverage, and reducing escaped defects in production.
- Partner with Security and Engineering leaders to ensure cloud infrastructure, CI/CD pipelines, and operational tooling meet HIPAA, SOC2, and internal security standards.
- Oversee core platform operations including Azure AKS environments, **Kubernetes**, GitOps workflows, CI/CD pipelines, **GitHub Actions**, secrets management, access controls, and audit readiness.
- Drive observability maturity using tools such as **Grafana**, **Prometheus**, logging platforms, tracing tools, and automated alerting frameworks.
- Collaborate with Product, Platform, and Engineering teams to embed reliability and quality best practices throughout the software development lifecycle.
- Build, mentor, and scale high-performing SRE and QA teams while fostering a culture of ownership, accountability, learning, and continuous improvement.
- Drive adoption of AI-enabled automation and intelligent tooling to reduce manual toil, improve productivity, and strengthen operational excellence.

## Technical Experience
- Strong hands-on experience with cloud infrastructure, preferably **Microsoft Azure**, including AKS, networking, storage, IAM, and security services.
- Deep expertise in **Kubernetes**, containerized workloads, and production-scale distributed systems.
- Experience building and managing CI/CD pipelines using **GitHub Actions**, **ArgoCD**, **Terraform**, or similar DevOps tooling.
- Strong background in monitoring, logging, tracing, and observability platforms such as **Grafana**, **Prometheus**, **Datadog**, **Splunk**, or equivalent.
- Experience with scripting and automation using **Python**, **Bash**, **PowerShell**, or similar languages.
- Strong understanding of release engineering, automated testing frameworks, QA tooling, and shift-left quality practices.
- Experience supporting SaaS applications with uptime, scalability, and security requirements in regulated industries such as healthcare.
- Knowledge of HIPAA, SOC2, vulnerability management, access controls, and infrastructure security best practices.
- Familiarity with databases, APIs, networking, and troubleshooting across modern web application stacks.
- Exposure to AI-powered DevOps / AIOps tooling for incident management, automation, and engineering productivity is a plus.

## Requirements
- 12+ years of SRE, infrastructure, or platform engineering experience, with 5+ years of engineering leadership roles.
- Proven track record owning site reliability for complex, multi-tenant SaaS platforms with demanding availability requirements.
- Demonstrated experience defining SLA and SLO frameworks, error budgets, and incident management processes at scale.
- Experience managing vendor relationships for managed infrastructure or SRE services, including SLA governance and performance management.
- Track record leading QA or quality engineering functions, including test automation maturity and release gate ownership.
- Strong communication and cross-functional influence skills — able to represent reliability to both technical and non-technical audiences.

## Preferred Qualifications
- Experience in healthcare technology, HIPAA-compliant environments, or other highly regulated SaaS industries.
- Familiarity with FHIR-native or EMR/EHR platform architectures and their specific reliability requirements.
- Experience implementing AI-assisted SRE automation including runbook generation, anomaly detection, or incident triage tooling.
- Background working with **Playwright** or equivalent test automation frameworks in a QA leadership capacity.
- Experience building internal SRE capability alongside a managed services provider.

## Compensation
- Base salary range: $175k-$200k, with variable component and stock options.
**Apply:** https://hotfix.jobs/jobs/director-of-sre-fte-at-intuscare-70c68bc7-0427-46cb-a55f-d74b0d9ba4f9
**Canonical:** https://hotfix.jobs/jobs/director-of-sre-fte-at-intuscare-70c68bc7-0427-46cb-a55f-d74b0d9ba4f9