Skip to content

Staff Site Reliability Engineer, Kubernetes w/ active TS/SCI

188k – 235kWashington, DCHybrid5+ YOE
Summary

Senior SRE focused on Kubernetes-orchestrated cloud infrastructure for high-stakes national security environments. Manages reliability, incidents, automation, and scalability with active TS/SCI clearance and 5+ years Kubernetes experience.

About the role

What You’ll Do

  • Infrastructure Excellence: Design, deploy, and monitor Okta’s production infrastructure to ensure peak performance and reliability.
  • Incident Management: Serve as a frontline responder to production incidents, performing deep-dive troubleshooting and implementing permanent preventive solutions.
  • Aggressive Automation: Eliminate manual toil by developing automation scripts, evolving monitoring tools, and documenting technical workflows.
  • Scalability: Support a highly available, large-scale environment as part of an on-call rotation, ensuring "Always On" service delivery.

What You’ll Bring

Core Requirements

  • Clearance & Citizenship: Active TS/SCI clearance.
  • Federal Compliance: Deep familiarity with FedRAMP and DoD IL6 compliance standards.
  • Education: B.S. in Computer Science or equivalent professional experience.

Technical Expertise

  • Kubernetes Mastery: 5+ years of experience building and operating workloads orchestrated by Kubernetes, including expert-level debugging of Helm values and charts.
  • Systems & Scripting: Strong Linux systems administration background with proficiency in Go, Python, Bash, or Ruby.
  • Cloud Infrastructure: Expertise in AWS services (EC2, ECS, KMS, CloudWatch) and Infrastructure as Code (Terraform or CloudFormation).
  • Production Support: Experience managing Docker containers and web applications (Java/Apache/Tomcat) in high-traffic live environments.

Networking: Solid understanding of networking concepts and IP protocols; experience with multi-cloud environments is a significant plus.

Skills
KubernetesHelmAWSTerraformDockerLinuxPythonGoBashFedRAMP
Similar roles at this salary range
All DevOps / SRE jobs →
Crusoe

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit
Aurelian

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse
Stuut

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS
Huntress

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby
Crusoe

Staff Network Engineer, Operations

Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.

195k – 235kSan Francisco, CADevOps / SREOn-siteBGPQoS