Sr. Manager - Production Engineering
Lead engineering team managing cloud IAM operations, CSP provisioning, compliance, and security data pipelines across AWS, Azure, GCP. Requires 8+ years in security/cloud engineering, 5+ years management, and BS in technical field.
The impact you will have
- Build, lead, and grow a high-performing team of engineers responsible for cloud IAM operations, CSP environment management, security data pipelines, and compliance operations across AWS, Azure, and GCP.
- Define and execute the strategy and roadmap for automating cloud access assignment, approval workflows, and IAM policy enforcement to reduce manual toil while strengthening security controls.
- Own the end-to-end lifecycle of CSP account, subscription, and project provisioning, including secure onboarding of acquired companies' cloud environments into Databricks' organizations with minimal disruption.
- Drive compliance programs including Cloud User Access Reviews, audit evidence collection, and IAM policy alignment to meet SOC2, FedRAMP, and other regulatory requirements.
- Ensure the reliability and timeliness of security data pipelines that ingest CSP audit logging, enabling downstream detection and response capabilities.
- Partner closely with Security, Security Engineering, and IT to interpret and operationalize security policies, ensuring consistent enforcement with high transparency and minimal friction for engineering teams.
- Lead complex, multi-quarter initiatives spanning multiple teams and external partners, demonstrating leverage by executing through technical leads and developing future leaders on the team.
- Lead GovCloud escort operations, including staffing a 24x7 on-call rotation for SEV0 incidents, and continuously improving operational resilience.
What we look for
- 8+ years of experience in security engineering, cloud infrastructure, or production/site reliability engineering, with deep hands-on expertise in at least one major cloud provider (AWS, Azure, or GCP).
- 5+ years of engineering management experience, including building teams, developing senior engineers and managers, and navigating complex people situations such as promotions and performance management.
- Strong technical understanding of cloud IAM (policies, principals, roles, federation), CSP organizational structures, and identity governance frameworks.
- Demonstrated success leading operational teams that balance high-throughput request execution with automation and process improvement.
- Experience with compliance and audit workflows (SOC2, FedRAMP, ISO, or similar), including evidence collection and access review programs.
- Track record of driving cross-functional initiatives that require influencing without direct authority across security, infrastructure, and engineering organizations.
- Strong communication and stakeholder management skills, with the ability to translate security policy into practical operational processes that engineering teams can follow.
- BS (or higher) in Computer Science, Information Security, or a related technical field.
Lead Site Reliability Engineer
Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.
Staff Network Engineer, Operations
Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.
Staff Software Engineer, AI Developer Tools
Staff-level engineer architecting AI-native developer tools and infrastructure to accelerate engineering velocity across Gusto. Requires 8+ years experience building production AI systems with deep expertise in LLMs, RAG, and multi-agent workflows.