Sr Staff Production Engineer- Public Sector
Senior Production Engineer owns secure cloud infrastructure, IAM, and automation across AWS, Azure, GCP for public sector and regulated environments. Requires 12+ years experience, cloud expertise, and TS/SCI clearance eligibility.
The Impact You'll Have
Security-Focused Cloud Operations
- Design, automate, and operate the IAM, account/subscription, and project lifecycle across AWS, Azure, and GCP, enforcing least-privilege and standardized access patterns at scale.
- Review, implement, and continuously improve cloud identity and access policies (IAM, Okta, Opal) to align with Databricks security standards and audit requirements.
Production Engineering & Automation
- Build and maintain reliable, observable automation and tooling to apply cloud changes (roles, policies, accounts, networking) safely and repeatedly.
- Treat operational and security issues as software problems: eliminate toil, drive root-cause analysis, and codify fixes into infrastructure and tooling.
Security Data Pipelines & Compliance
- Own and improve security and audit logging data pipelines from cloud providers into our internal systems, ensuring timely, accurate data for detection, investigations, and audits.
- Partner with Security, Compliance, and Audit teams to provide evidence, clarifications, and policy updates that keep our environments aligned with evolving standards.
Regulated & Specialized Environments
- Operate and improve specialized, highly regulated environments (e.g., FedRAMP / GovCloud) including release management, patching cadences, and supporting secure access workflows (e.g., SAW).
- Ensure high availability and resiliency for critical security and access infrastructure across these environments.
On-Call & Incident Response
- Participate in a 24x7 on-call rotation for high-severity incidents impacting cloud accounts, IAM, or security data pipelines.
- Act as a key partner to product engineering, security engineering, and field teams during incidents to restore service and harden systems for the future.
What We Look For
Required: Candidates must be eligible for a Top Secret / Sensitive Compartmented Information (TS/SCI) security clearance.
Nice to have: Possession of a current polygraph (Counterintelligence or Full Scope) is highly desired and considered a significant plus.
Education: BS, MS, or PhD in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
Experience: 12+ years of experience, including leading the strategy for cloud IAM, account architecture, or security-critical infrastructure across multiple environments or business units.
Cloud & Infrastructure Expertise
- Deep hands-on experience with at least one major cloud provider (AWS, Azure, or GCP) in areas such as IAM, networking, accounts/subscriptions/projects, and audit logging.
- Strong background in Infrastructure-as-Code and automation (e.g., Terraform, CloudFormation, or similar) and CI/CD for infrastructure changes.
Security & Compliance Mindset
- Proven experience working in or with security-sensitive or regulated environments (e.g., SOC2, FedRAMP, ISO 27001, financial services, public sector) and translating requirements into concrete technical controls.
- Familiarity with access review processes, policy baselines, and audit evidence for cloud environments.
Operational Excellence
- Demonstrated success running high-availability, security-critical services, including on-call responsibilities and incident management.
- Strong debugging and problem-solving skills across distributed systems, with the ability to navigate ambiguous issues spanning multiple teams and platforms.
Bonus
- Experience with Okta, Opal, or similar identity/access tooling.
- Background operating secure admin workstations (SAW) or comparable hardened access patterns.
- Experience migrating cloud accounts or subscriptions during M&A or large-scale reorganizations.
Pay Range: $195,400—$268,600 USD
Lead Site Reliability Engineer
Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.
Senior Developer Experience Engineer
Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.
Staff Network Engineer, Operations
Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.