Staff Storage Systems Engineer
Leads architecture, operation, and optimization of petabyte-scale storage systems for AI/HPC workloads, including performance tuning, vendor evaluation, and RFP processes. Requires 10+ years in storage administration with deep expertise in enterprise arrays and I/O optimization.
What You'll Be Working On
Performance Analysis & Optimization
- Evaluate performance of block, file, and object storage systems across diverse workloads.
- Identify bottlenecks at the hardware, firmware, OS, and application layers.
- Develop and execute performance test plans, benchmarks, and stress tests.
- Tune storage stacks (I/O schedulers, caching layers, drivers, protocols) to achieve target KPIs.
Validation & Testing
- Design and execute Proof of Concept (PoC) exercises to take new arrays through their paces.
- Validate new vendor software releases in staging environments before rolling them out to global production footprint.
Full-Stack Administration
- Own the initial bring-up, configuration, and ongoing performance tuning of large enterprise arrays.
- Manage the lifecycle of the storage OS, ensuring all systems are optimized for AI training and inference I/O patterns.
Enterprise Infrastructure Building
- Collaborate with the Compute and Networking teams to build a seamless "gold standard" cloud infrastructure.
- Design cloud-scale storage systems that can excel in high-concurrency, high-throughput environments.
Storage Strategy & Selection
- Lead the technical evaluation of new storage technologies.
- Author RFPs, review vendor responses, and lead "down selection" processes to ensure investment in the best hardware for AI workloads.
Vendor Roadmap Influence
- Serve as the primary technical point of contact for storage partners (such as VAST Data, Pure Storage).
- Sit with their engineering teams to provide feedback on bugs, missing features, and prioritize Crusoe’s requirements on their development roadmaps.
Cross-Functional Collaboration
- Work closely with service engineering and architecture teams to influence design decisions.
- Provide performance guidance during feature development and release cycles.
- Communicate findings to both technical and non-technical stakeholders.
What You'll Bring to the Team
- 10+ years of experience in storage systems administration with a heavy focus on petabyte-scale, on-premise data environments.
- Strong understanding of storage architectures (block, file, object) and I/O paths.
- Hands-on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli, nfs-gaze, eBPF, etc.).
- Experience with SSDs, NVMe, RAID, caching, or distributed storage systems.
- Deep familiarity with enterprise flash arrays and distributed file systems. Specific experience with VAST Data, Pure Storage (Everpure) is highly preferred.
- Proficiency with scripting (Python, Go or bash) to automate array management and monitoring.
- Ability to analyze complex performance data and present clear conclusions.
- Proven ability to lead the authoring of technical requirements, evaluating RFP responses and managing complex vendor relationships.
- Experience with system design for specific I/O use cases (AI training/inference) and a disciplined approach to testing and validating new vendor releases.
Bonus Points
- Experience with RDMA, iSCSI, NVME-oF, RoCEv2 or InfiniBand networking as it relates to high-performance storage.
- Previous experience at a major Cloud Service Provider (CSP) or a high-scale AI infrastructure company.
- Familiarity with distributed storage systems (Ceph, Lustre, Gluster, etc.).
Benefits & Compensation
Compensation Range: $180,000 - $225,000 + Bonus. Restricted Stock Units are included in all offers.
Senior Infrastructure Engineer
Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.
Lead Site Reliability Engineer
Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.
Senior Developer Experience Engineer
Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.