Crusoe DevOps / SRE Jobs
Open devops / sre roles at Crusoe, pulled live from their hiring system.
View devops / sre jobs across all companies
DevOps / SRE roles at Crusoe roles cluster around $193k, with most listings between $167k and $209k. Most common stacks in the current devops / sre listings: Kubernetes, Python, Go. Most of these devops / sre roles are on-site or hybrid; 2% are fully remote.
Staff Network Engineer, Operations
Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.
Senior Staff Network Engineer, Operations
The Senior Staff Network Operations Engineer will own production reliability for Crusoe's global network, including edge, backbone, data center fabric, and GPU cluster interconnects. This role involves leading incident response, driving root cause analysis, defining SLIs/SLOs, and setting operational standards to maintain hyperscale AI infrastructure health.
Senior Staff Network Engineer, Automation
Senior technical leader owning Crusoe's network automation platform, source of truth, intent-based config systems, and self-healing workflows across hyperscale multi-vendor fabrics. Requires 12+ years of production network automation experience with deep expertise in Python/Go, model-driven telemetry, and observability at 10K+ device scale.
Senior Production Engineer
As a Senior Production Engineer, you will ensure the reliability and scalability of Crusoe’s AI-optimized cloud platform, focusing on designing and operating managed AI services for LLM workloads. You will build automation and reliability tooling, define SLIs/SLOs, and optimize large-scale training and inference clusters.
Senior Staff Network Engineer, Deployment
Senior technical leader owning global network infrastructure deployment strategy, automation platforms, and standards for Crusoe's hyperscale AI data centers. Requires 12+ years of large-scale data center deployment experience with deep expertise in Arista, Juniper, and NVIDIA platforms.
Staff Software Engineer, Managed Orchestration (Managed Kubernetes)
Staff Software Engineer designs, builds, and scales managed Kubernetes and AI training clusters, focusing on reliability, performance, and orchestration using Go, Terraform, and GCP. Oversees architecture, CI/CD pipelines, and critical infrastructure projects requiring 8+ years experience.
Plant Engineer
Oversees operations, maintenance, and efficiency of natural gas turbines powering AI data centers. Leads engineering design, compliance, troubleshooting, root cause analysis, and contractor management to ensure high availability of mission-critical power infrastructure. Requires bachelor's in engineering and turbine expertise.
Senior Production Engineer, Operational Excellence
Senior Production Engineer ensures reliability, scalability, and performance of GPU cloud infrastructure powering AI workloads. Drives observability, incident response, automation, and operational improvements in large-scale distributed systems.
Staff Infrastructure Engineer
Staff Infrastructure Engineer manages cloud infrastructure operations, develops automation for server provisioning, scales deployments, troubleshoots GPU hardware, and leads Kubernetes transition. Requires strong Linux, hardware, and Kubernetes expertise.
Senior Manager, Data Center Operations
Leads data center operations across Sparks, NV and San Jose, CA sites, managing white-space hardware, KPIs for uptime/MTTR/power, team of technicians, and vendor relations for high-density AI infrastructure. Requires 8+ years experience with deep hardware and facilities expertise.
Senior Storage Systems Engineer
Senior Storage Systems Engineer manages VAST Data and Pure Storage flash arrays for high-performance AI/HPC workloads, handling administration, performance monitoring, non-disruptive upgrades, data protection, Tier 3 support, and automation. Requires 5+ years storage experience, Linux proficiency, and protocol expertise.
Staff Network Engineer, Deployment
Leads physical and logical deployment of network infrastructure in data centers for AI/HPC, including rack/stack, testing, automation with Python/Ansible, and partner coordination. Requires 8+ years experience with Arista, Juniper, NVIDIA hardware, BGP/EVPN, and physical layer expertise.
Senior Virtualization Validation Engineer
Validates large-scale multi-node GPU clusters using QEMU and Cloud Hypervisor, focusing on interconnects like NVLink/InfiniBand, collective communications (NCCL/RCCL), and performance in virtualized AI/HPC environments. Requires 5+ years experience, virtualization expertise, and Linux kernel knowledge.
Staff Instrumentation & Controls Engineer, Deployment
Leads deployment, integration, and startup of BMS/EPMS/SCADA systems in data centers, ensuring seamless operation of HVAC, electrical, and monitoring infrastructure. Oversees contractors, troubleshoots protocols like BACnet/Modbus, and conducts testing/handover; requires bachelor's in engineering and hands-on data center experience.
Electrical Field Engineer - Data Center
Electrical Field Engineer supports on-site installation, testing, and commissioning of data center power systems like switchgear, transformers, UPS, and generators. Requires 5+ years experience, Bachelor's in Electrical Engineering, and 50%+ travel to sites.
Senior API Integration Engineer
Leads design and delivery of enterprise API integrations using Workato for People Tech ecosystem, automating workflows across ERP, CRM, HCM systems. Requires 7+ years experience with Workato recipes, iPaaS patterns, API security, and stakeholder collaboration.
Senior Staff Software Engineer, Managed Orchestration
Leads architecture and development of scalable managed Kubernetes and AI orchestration systems, providing technical direction for cloud infrastructure reliability and performance. Requires 10+ years in software engineering with deep expertise in Go, Kubernetes, and large-scale systems.
Staff Storage Systems Engineer
Leads architecture, operation, and optimization of petabyte-scale storage systems for AI/HPC workloads, including performance tuning, vendor evaluation, and RFP processes. Requires 10+ years in storage administration with deep expertise in enterprise arrays and I/O optimization.
Staff Software Engineer, CAPE
Architects and builds intelligence layer for GPU fleet management, including Virtual Pool Service and Capacity Management Intelligence systems. Requires 10+ years in distributed systems, fluency in Go or similar, and Bachelor's in CS.
Staff Software Engineer, Systems Engineering Focus
Designs, builds, and scales customer-facing managed services with a focus on edge agents running on customer infrastructure. Provides technical oversight for high-reliability systems using eBPF, Kubernetes, and low-level Linux metrics; leads cross-team collaboration and mentors engineers.
Associate Systems Software Engineer
Develops Linux-based compute applications for managing virtualization stacks across AI compute servers, integrates with AI hardware like GPUs and NICs, and optimizes performance for AI/ML workloads in datacenters. Requires Linux kernel familiarity, systems programming, and hardware integration skills.
Commissioning Engineer II
Hands-on commissioning engineer supporting test execution, documentation, and coordination for data center MEP, BMS, and power/cooling systems. Requires 2+ years experience, engineering degree, and 75% travel across project sites.
Storage Systems Administrator II
Manages daily operations, health monitoring, maintenance, and troubleshooting of VAST Data and Pure Storage all-flash systems to support high-performance AI workloads. Requires 2-6 years storage/systems admin experience, Linux proficiency, scripting, and high-performance protocols.
Senior Staff Engineer, Cloud Site Operations
Leads technical architecture for data center operations, overseeing global ticket queues, fleet supportability, power topology, resilience planning, and hardware failure escalations for AI infrastructure. Requires 10+ years in data center ops or HPC with deep NVIDIA GPU expertise.