Senior Manager, Site Reliability Engineering - Infrastructure Platform

176k – 264kBellevue, WAHybrid6+ YOEFeb 10

Summary

Leads Infrastructure Platform and Shared Services teams, overseeing Edge networking, Kubernetes platform, CI/CD, observability, and automation. Requires 6+ years technical leadership, AWS expertise, and strong Kubernetes/Terraform skills.

About the role

The Infrastructure Platform and Shared Services Team

Okta authenticates, authorizes and provisions millions of users a day. The service is hosted on Amazon Web Services (AWS) across multiple availability zones and geographically separated regions. The service is designed for high throughput and 99.999 availability.

As the Sr. Manager of Infrastructure Platform and Shared Services, you will oversee multiple teams focused on Edge networking, K8s platform, CI/CD, Observability, automation platform & tooling.

What you’ll be doing

Lead the Infra platform and shared services org and various initiatives across SRE & Infrastructure organization.
Lead the DevOps transformation, microservice journey, and next generation Infra platform capabilities in partnership with architects and product engineering
Build a world-class observability platform and monitoring capabilities enabled with self-service
Accelerate the velocity of SRE and product engineering by developing robust platforms, powerful tooling, and intuitive self-service capabilities.
Own the design and operation of scalable, self-service Cloud infrastructure platforms (e.g., Kubernetes, service mesh, CI/CD pipelines, IaC & Edge Infrastructure)
Lead, mentor, and grow a high-performing team of engineers and managers across platform, infrastructure, and shared services domains.
Perform engineering design evaluations and ensure the completion of projects within resource, budget, and scheduling constraints.
Improve SDLC processes for Cloud infrastructure as a code, including the maturity of CI/CD pipelines, change and release management
Manage service and business expectations and prioritize resource allocation
Maintain a deep knowledge of industry best practices, evolving trends, and technologies

What you’ll bring to the role

6+ years of experience in technical leadership & people management
Extensive experience using Agile and DevOps methodologies to build product infrastructure and shared service at scale
3+ years of experience running large-scale infrastructure platforms supporting a SaaS/Cloud service in a public Cloud, preferably AWS. Experience supporting a multi-Cloud environment will be a plus.
Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines
Strong background and hands-on experience in SW development, PaaS and automation
Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment.
Demonstrated ability to lead cross-functional teams and manage large-scale programs
Effective verbal, written communication and interpersonal skills
Computer Science Degree or related degree or equivalent experience

Skills

KubernetesAWSTerraformCI/CDGrafanaSplunkDevOpsIaCObservabilityService Mesh

Similar roles at this salary range

All DevOps / SRE jobs →

Crusoe

Jun 8

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit

Aurelian

Jun 8

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Aurelian

Jun 8

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Stuut

Jun 8

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS

Huntress

Jun 8

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby

Apply