Senior Staff Software Engineer, Managed Orchestration
Leads architecture and development of scalable managed Kubernetes and AI orchestration systems, providing technical direction for cloud infrastructure reliability and performance. Requires 10+ years in software engineering with deep expertise in Go, Kubernetes, and large-scale systems.
What You'll Be Working On
- Drive the development of scalable, resilient, and high-performance software solutions, ensuring alignment with and influence over the strategic objectives outlined in the Crusoe Cloud roadmap
- Provide technical leadership across multiple teams, fostering a culture of innovation, engineering excellence, and accountability while enabling teams to deliver cutting-edge cloud solutions
- Define and evolve architectural standards and best practices, ensuring consistency, scalability, and long-term maintainability across systems
- Continuously stay ahead of emerging trends and technologies in cloud software, proactively shaping Crusoe's technical direction and incorporating innovations that maintain competitive advantage
- Act as a mentor and multiplier for engineering talent, elevating team capabilities through coaching, design reviews, and thought leadership in technical discussions
- Lead cross-functional initiatives and drive alignment between engineering, product, and infrastructure teams to deliver cohesive and impactful solutions
What You'll Bring to the Team
- 10+ years of experience working in software engineering, with deep expertise in Systems Engineering and large-scale distributed systems
- 3+ years of programming experience in GoLang, with a track record of delivering production-grade systems
- Extensive experience with Kubernetes and Linux Engineering, including advanced debugging and performance optimization
- Highly skilled in infrastructure as code and have a strong understanding of complex systems-level challenges at scale
- Experience with Terraform and GCP (preferred), with the ability to influence platform-level decisions
- Strong understanding of Argo, CI/CD, and Automated Testing pipelines, including designing and scaling them for large organizations
- Can architect, build, and evolve Kubernetes operators and controllers, owning critical components that ensure the reliability, scalability, and efficiency of the Kubernetes environment
- Experience designing and operating large-scale systems comparable to leading services like Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
- Can lead and deliver critical, high-impact projects, driving initiatives across networking, quality control, automation, and system reliability at an organizational level
- Can define and own system architecture end-to-end, including CI/CD pipelines, ensuring scalability, security, and long-term sustainability
- Exceptional communication skills, with the ability to influence technical and non-technical stakeholders and drive alignment across the organization
Compensation
Compensation will be paid in the range of up to $237,600 - $288,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data.
Lead Site Reliability Engineer
Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.
Staff Network Engineer, Operations
Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.
Senior Software Engineer - Internal Observability
Senior engineer building AI-powered observability systems and large-scale telemetry pipelines for Snowflake's multi-cloud data platform. Requires 7+ years focused on distributed systems and cloud services.
Platform Engineer
Own AWS infrastructure, Pulumi IaC, deployment pipelines, and security baseline for an AI research platform serving financial institutions. First dedicated platform hire defining enterprise deployment, SOC 2 controls, and developer experience.