Staff Storage Systems Engineer

180k – 225kSan Francisco, CAOnsite10+ YOEApr 15

Summary

Leads architecture, operation, and optimization of petabyte-scale storage systems for AI/HPC workloads, including performance tuning, vendor evaluation, and RFP processes. Requires 10+ years in storage administration with deep expertise in enterprise arrays and I/O optimization.

About the role

What You'll Be Working On

Performance Analysis & Optimization

Evaluate performance of block, file, and object storage systems across diverse workloads.
Identify bottlenecks at the hardware, firmware, OS, and application layers.
Develop and execute performance test plans, benchmarks, and stress tests.
Tune storage stacks (I/O schedulers, caching layers, drivers, protocols) to achieve target KPIs.

Validation & Testing

Design and execute Proof of Concept (PoC) exercises to take new arrays through their paces.
Validate new vendor software releases in staging environments before rolling them out to global production footprint.

Full-Stack Administration

Own the initial bring-up, configuration, and ongoing performance tuning of large enterprise arrays.
Manage the lifecycle of the storage OS, ensuring all systems are optimized for AI training and inference I/O patterns.

Enterprise Infrastructure Building

Collaborate with the Compute and Networking teams to build a seamless "gold standard" cloud infrastructure.
Design cloud-scale storage systems that can excel in high-concurrency, high-throughput environments.

Storage Strategy & Selection

Lead the technical evaluation of new storage technologies.
Author RFPs, review vendor responses, and lead "down selection" processes to ensure investment in the best hardware for AI workloads.

Vendor Roadmap Influence

Serve as the primary technical point of contact for storage partners (such as VAST Data, Pure Storage).
Sit with their engineering teams to provide feedback on bugs, missing features, and prioritize Crusoe’s requirements on their development roadmaps.

Cross-Functional Collaboration

Work closely with service engineering and architecture teams to influence design decisions.
Provide performance guidance during feature development and release cycles.
Communicate findings to both technical and non-technical stakeholders.

What You'll Bring to the Team

10+ years of experience in storage systems administration with a heavy focus on petabyte-scale, on-premise data environments.
Strong understanding of storage architectures (block, file, object) and I/O paths.
Hands-on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli, nfs-gaze, eBPF, etc.).
Experience with SSDs, NVMe, RAID, caching, or distributed storage systems.
Deep familiarity with enterprise flash arrays and distributed file systems. Specific experience with VAST Data, Pure Storage (Everpure) is highly preferred.
Proficiency with scripting (Python, Go or bash) to automate array management and monitoring.
Ability to analyze complex performance data and present clear conclusions.
Proven ability to lead the authoring of technical requirements, evaluating RFP responses and managing complex vendor relationships.
Experience with system design for specific I/O use cases (AI training/inference) and a disciplined approach to testing and validating new vendor releases.

Bonus Points

Experience with RDMA, iSCSI, NVME-oF, RoCEv2 or InfiniBand networking as it relates to high-performance storage.
Previous experience at a major Cloud Service Provider (CSP) or a high-scale AI infrastructure company.
Familiarity with distributed storage systems (Ceph, Lustre, Gluster, etc.).

Benefits & Compensation

Compensation Range: $180,000 - $225,000 + Bonus. Restricted Stock Units are included in all offers.

Skills

VAST DataPure StorageNVMeFIOElBenchoblktracenvme-clieBPFPythonGobashSSDsRAIDCephLustre

Similar roles at this salary range

All DevOps / SRE jobs →

Crusoe

Jun 8

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit

Aurelian

Jun 8

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Aurelian

Jun 8

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Stuut

Jun 8

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS

Huntress

Jun 8

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby

Apply