Senior Manager, Data Center Operations

179k – 218kSparks, NVSan Jose, CASunnyvale, CAOnsite8+ YOEMay 4

Summary

Leads data center operations across Sparks, NV and San Jose, CA sites, managing white-space hardware, KPIs for uptime/MTTR/power, team of technicians, and vendor relations for high-density AI infrastructure. Requires 8+ years experience with deep hardware and facilities expertise.

About the role

What You’ll Be Working On

KPI Architecture & Leadership Reporting

Design and implement a robust framework of Key Performance Indicators (KPIs) from scratch
Define and track metrics for uptime, MTTR (Mean Time to Repair), deployment velocity, and power utilization
Provide data-driven updates to executive leadership

Infrastructure Oversight

Act as the technical lead for the "white space" while maintaining a deep understanding of the specialized electrical and mechanical systems (UPS, PDUs, specialized cooling) that support our unique Sparks deployments

Regional Scale & Strategy

Lead the operational rollout for Crusoe Sparks (NV) and the San Jose Lab (CA)
Develop the roadmap for scaling operations as the West Coast Region expands

Hardware Lifecycle & Break-Fix

Oversee the day-to-day maintenance of AI-optimized hardware
Drive rapid diagnostics, component replacement (GPU trays, DIMMs, etc.), and streamlined RMA processes across the region

Lab-to-Production Pipeline

Bridge the gap between the San Jose Lab and our Crusoe Cloud production sites
Document deployment standards that allow seamless hardware transitions from experimental lab phases to large-scale production

Vendor & Landlord Relations

Act as the primary liaison for colocation landlords and utility partners
Hold them accountable to SLAs, ensuring the facility infrastructure meets the demanding requirements of our high-density AI clusters

Team Leadership

Build, mentor, and scale a high-performing regional team of technicians
Foster a culture of technical precision, safety, and operational discipline

What You’ll Bring to the Team

Proven Leadership: 8+ years in data center operations, managing distributed white space or lab environments across multiple locations
Infrastructure Fluency: A strong technical understanding of data center electrical and mechanical systems. You can speak the language of facilities engineers and understand the unique constraints of high-density AI power and cooling
Analytical Rigor: Demonstrated experience defining and building operational metrics. You have a track record of using data to tell a story and drive process improvements
Deep Hardware Expertise: Hands-on experience with enterprise-grade server architecture; specific experience with GPU-heavy clusters (NVIDIA/AMD) is highly preferred
The Multi-Site Mindset: Experience operating in colocation or leased-space environments. You know how to manage diverse landlord relationships to protect Crusoe’s operational interests
Tactical Versatility: You are equally comfortable presenting high-level KPI dashboards to the VP of Operations as you are on the floor with a crash cart and a multimeter
Mobility & Reliability: Willingness to travel between Crusoe Cloud data center locations as needed, and the flexibility to support critical hardware failures or deployment pushes

Benefits

Competitive compensation and equity packages
Restricted Stock Units
Paid time off, paid holidays & leave of absence programs
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off
Global travel insurance & emergency assistance
Daily meals allowance
Additional perks & programs specific to location

Compensation Range: $179,000 - $218,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data.

Skills

Data Center OperationsKPIsMTTRUPSPDUsGPU ClustersNVIDIAAMDServersRMAColocationFacilities EngineeringHigh-Density CoolingPower Utilization

Similar roles at this salary range

All DevOps / SRE jobs →

Crusoe

Jun 8

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit

Aurelian

Jun 8

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Aurelian

Jun 8

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Stuut

Jun 8

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS

Huntress

Jun 8

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby

Apply