Senior Manager, Data Center Operations
Leads data center operations across Sparks, NV and San Jose, CA sites, managing white-space hardware, KPIs for uptime/MTTR/power, team of technicians, and vendor relations for high-density AI infrastructure. Requires 8+ years experience with deep hardware and facilities expertise.
What You’ll Be Working On
KPI Architecture & Leadership Reporting
- Design and implement a robust framework of Key Performance Indicators (KPIs) from scratch
- Define and track metrics for uptime, MTTR (Mean Time to Repair), deployment velocity, and power utilization
- Provide data-driven updates to executive leadership
Infrastructure Oversight
- Act as the technical lead for the "white space" while maintaining a deep understanding of the specialized electrical and mechanical systems (UPS, PDUs, specialized cooling) that support our unique Sparks deployments
Regional Scale & Strategy
- Lead the operational rollout for Crusoe Sparks (NV) and the San Jose Lab (CA)
- Develop the roadmap for scaling operations as the West Coast Region expands
Hardware Lifecycle & Break-Fix
- Oversee the day-to-day maintenance of AI-optimized hardware
- Drive rapid diagnostics, component replacement (GPU trays, DIMMs, etc.), and streamlined RMA processes across the region
Lab-to-Production Pipeline
- Bridge the gap between the San Jose Lab and our Crusoe Cloud production sites
- Document deployment standards that allow seamless hardware transitions from experimental lab phases to large-scale production
Vendor & Landlord Relations
- Act as the primary liaison for colocation landlords and utility partners
- Hold them accountable to SLAs, ensuring the facility infrastructure meets the demanding requirements of our high-density AI clusters
Team Leadership
- Build, mentor, and scale a high-performing regional team of technicians
- Foster a culture of technical precision, safety, and operational discipline
What You’ll Bring to the Team
- Proven Leadership: 8+ years in data center operations, managing distributed white space or lab environments across multiple locations
- Infrastructure Fluency: A strong technical understanding of data center electrical and mechanical systems. You can speak the language of facilities engineers and understand the unique constraints of high-density AI power and cooling
- Analytical Rigor: Demonstrated experience defining and building operational metrics. You have a track record of using data to tell a story and drive process improvements
- Deep Hardware Expertise: Hands-on experience with enterprise-grade server architecture; specific experience with GPU-heavy clusters (NVIDIA/AMD) is highly preferred
- The Multi-Site Mindset: Experience operating in colocation or leased-space environments. You know how to manage diverse landlord relationships to protect Crusoe’s operational interests
- Tactical Versatility: You are equally comfortable presenting high-level KPI dashboards to the VP of Operations as you are on the floor with a crash cart and a multimeter
- Mobility & Reliability: Willingness to travel between Crusoe Cloud data center locations as needed, and the flexibility to support critical hardware failures or deployment pushes
Benefits
- Competitive compensation and equity packages
- Restricted Stock Units
- Paid time off, paid holidays & leave of absence programs
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off
- Global travel insurance & emergency assistance
- Daily meals allowance
- Additional perks & programs specific to location
Compensation Range: $179,000 - $218,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data.
Senior Infrastructure Engineer
Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.
Lead Site Reliability Engineer
Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.
Senior Developer Experience Engineer
Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.