Skip to content

Staff Storage Systems Engineer

180k – 225kSan Francisco, CAOnsite10+ YOE
Summary

Leads architecture, operation, and optimization of petabyte-scale storage systems for AI/HPC workloads, including performance tuning, vendor evaluation, and RFP processes. Requires 10+ years in storage administration with deep expertise in enterprise arrays and I/O optimization.

About the role

What You'll Be Working On

Performance Analysis & Optimization

  • Evaluate performance of block, file, and object storage systems across diverse workloads.
  • Identify bottlenecks at the hardware, firmware, OS, and application layers.
  • Develop and execute performance test plans, benchmarks, and stress tests.
  • Tune storage stacks (I/O schedulers, caching layers, drivers, protocols) to achieve target KPIs.

Validation & Testing

  • Design and execute Proof of Concept (PoC) exercises to take new arrays through their paces.
  • Validate new vendor software releases in staging environments before rolling them out to global production footprint.

Full-Stack Administration

  • Own the initial bring-up, configuration, and ongoing performance tuning of large enterprise arrays.
  • Manage the lifecycle of the storage OS, ensuring all systems are optimized for AI training and inference I/O patterns.

Enterprise Infrastructure Building

  • Collaborate with the Compute and Networking teams to build a seamless "gold standard" cloud infrastructure.
  • Design cloud-scale storage systems that can excel in high-concurrency, high-throughput environments.

Storage Strategy & Selection

  • Lead the technical evaluation of new storage technologies.
  • Author RFPs, review vendor responses, and lead "down selection" processes to ensure investment in the best hardware for AI workloads.

Vendor Roadmap Influence

  • Serve as the primary technical point of contact for storage partners (such as VAST Data, Pure Storage).
  • Sit with their engineering teams to provide feedback on bugs, missing features, and prioritize Crusoe’s requirements on their development roadmaps.

Cross-Functional Collaboration

  • Work closely with service engineering and architecture teams to influence design decisions.
  • Provide performance guidance during feature development and release cycles.
  • Communicate findings to both technical and non-technical stakeholders.

What You'll Bring to the Team

  • 10+ years of experience in storage systems administration with a heavy focus on petabyte-scale, on-premise data environments.
  • Strong understanding of storage architectures (block, file, object) and I/O paths.
  • Hands-on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli, nfs-gaze, eBPF, etc.).
  • Experience with SSDs, NVMe, RAID, caching, or distributed storage systems.
  • Deep familiarity with enterprise flash arrays and distributed file systems. Specific experience with VAST Data, Pure Storage (Everpure) is highly preferred.
  • Proficiency with scripting (Python, Go or bash) to automate array management and monitoring.
  • Ability to analyze complex performance data and present clear conclusions.
  • Proven ability to lead the authoring of technical requirements, evaluating RFP responses and managing complex vendor relationships.
  • Experience with system design for specific I/O use cases (AI training/inference) and a disciplined approach to testing and validating new vendor releases.

Bonus Points

  • Experience with RDMA, iSCSI, NVME-oF, RoCEv2 or InfiniBand networking as it relates to high-performance storage.
  • Previous experience at a major Cloud Service Provider (CSP) or a high-scale AI infrastructure company.
  • Familiarity with distributed storage systems (Ceph, Lustre, Gluster, etc.).

Benefits & Compensation

Compensation Range: $180,000 - $225,000 + Bonus. Restricted Stock Units are included in all offers.

Skills
VAST DataPure StorageNVMeFIOElBenchoblktracenvme-clieBPFPythonGobashSSDsRAIDCephLustre
Similar roles at this salary range
All DevOps / SRE jobs →
Crusoe

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit
Aurelian

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse
Aurelian

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse
Stuut

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS
Huntress

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby