Skip to content

Staff Site Reliability Engineer, Core IDaaS w/ active TS/SCI

188k – 259kWashington, DCHybrid6+ YOE
Summary

Leads SRE for Core IDaaS in federal air-gapped environments, designing AWS infrastructure with Terraform/Helm/Go, managing incidents, and ensuring compliance (FedRAMP/IL6). Requires 6+ years SRE experience, TS/SCI clearance, and bachelor's degree.

About the role

What You’ll Do

  • Cloud & Air-Gapped Infrastructure: Design and deliver AWS-based projects, primarily writing Terraform, Helm, and Go, and adapting deployments for secure federal air-gapped environments.
  • Incident Management: Respond and remediate production incidents, performing deep-dive troubleshooting, driving rapid response, and implementing permanent preventive solutions.
  • Engineering Standards: Drive high-quality code and operational rigor through design reviews, code reviews, attention to detail, and deep technical expertise.
  • Technical Leadership: Mentor junior engineers and collaborate with cross-functional teams to deliver secure, enterprise-grade solutions.

What You’ll Bring

Core Qualifications

  • Security Clearance: Active U.S. TS/SCI clearance.
  • Compliance Expertise: Proven experience navigating Federal and DoD compliance frameworks, specifically FedRAMP and Impact Level 6 (IL6).
  • Domain Authority: Deep expertise in architecting, deploying, and optimizing software within federal air-gapped environments.
  • Education: Bachelor’s degree in Computer Science or a related technical field (Master’s degree preferred).

Technical Expertise

  • Site Reliability Engineering: 6+ years of professional experience running production cloud workloads at scale. 4+ years of experience developing and troubleshooting web services on Kubernetes or similar orchestration layers.
  • Broad Database Knowledge: Significant hands-on experience with both relational and non-relational datastores.
  • Interactive and Batch Workloads: Demonstrated success delivering and maintaining both customer-facing interactive workloads and large-scale batch processing – ideally using data warehouse products such as Snowflake, Redshift, or Databricks.
  • Security & Engineering: Strong foundational knowledge of network security (authentication/authorization) and a commitment to rigorous software engineering best practices.
  • Industry Experience: Prior experience supporting or building mission-critical Enterprise SaaS platforms.
Skills
TerraformHelmGoKubernetesAWSFedRAMPSnowflakeRedshiftDatabricksSRE
Similar roles at this salary range
All DevOps / SRE jobs →
Crusoe

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit
Aurelian

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse
Stuut

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS
Huntress

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby
Crusoe

Staff Network Engineer, Operations

Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.

195k – 235kSan Francisco, CADevOps / SREOn-siteBGPQoS