Senior Storage Systems Engineer

149k – 161kSan Francisco, CAOnsite5+ YOEApr 29

Summary

Senior Storage Systems Engineer manages VAST Data and Pure Storage flash arrays for high-performance AI/HPC workloads, handling administration, performance monitoring, non-disruptive upgrades, data protection, Tier 3 support, and automation. Requires 5+ years storage experience, Linux proficiency, and protocol expertise.

About the role

What You'll Be Working On

Flash Array Administration: Own the end-to-end management of VAST Data (Universal Storage) and Pure Storage (FlashBlade/FlashArray) environments, including initial setup, volume provisioning, and export management.
Performance Monitoring: Proactively monitor VAST and Pure clusters for IOPS, throughput, and latency bottlenecks, ensuring storage performance stays ahead of GPU demand.
Non-Disruptive Operations: Execute software upgrades (Purity//FB, VAST OS), expansion of D-Nodes/C-Nodes, and hardware refreshes with zero downtime for our AI customers.
Data Protection: Manage snapshots, replication policies, and data reduction (deduplication/compression) strategies to optimize TCO while ensuring 100% data durability.
Tier 3 Support: Act as the lead technical point of contact for storage incidents, working directly with VAST and Pure support engineering to resolve complex fabric or metadata issues.
Integration & Automation: Use APIs (REST, Python) to automate provisioning and integrate storage health metrics into our centralized observability stack (Grafana/Prometheus).

What You'll Bring to the Team

Technical Experience: 5–8+ years of experience in Storage Administration, with at least 3+ years of hands-on experience managing VAST Data or Pure Storage in a production environment.

Protocol Expertise: Deep understanding of NFS over RDMA, SMB, and NVMe-oF, and how they are implemented within VAST and Pure architectures.

Linux Systems Mastery: Strong command of the Linux CLI, specifically for mounting, tuning, and troubleshooting high-performance file systems.

Network Awareness: Understanding of how storage interacts with InfiniBand and RoCE fabrics to ensure low-latency data delivery to GPU nodes.

Scripting Skills: Proficiency in Python, Bash, or similar for automating volume creation, quota management, and reporting via storage APIs.

Operational Discipline: A meticulous approach to capacity planning and documentation, ensuring the environment remains stable as we add petabytes of scale.

Bonus Points

Experience with Pure1 or VAST VMS/Insight for predictive analytics and capacity forecasting.
Familiarity with Slurm or Kubernetes (CSI) integration with high-performance storage.
Prior experience in a "Large Scale" environment (multi-petabyte footprints).

Benefits

Competitive compensation and equity packages
Restricted Stock Units
Paid time off, paid holidays & leave of absence programs
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off
Global travel insurance & emergency assistance
Daily meals allowance
Additional perks & programs specific to location

Compensation Range

Compensation will be paid in the range of up to $148,500 - $161,000 + Bonus. Restricted Stock Units are included in all offers.

Skills

VAST DataPure StorageFlashBladeFlashArrayNFS over RDMANVMe-oFLinuxInfiniBandRoCEPythonBashREST APIGrafanaPrometheusKubernetes

Similar roles at this salary range

All DevOps / SRE jobs →

Ai2

Jun 8

Senior Software Engineer, AI Infrastructure

Senior engineer building and operating large-scale HPC infrastructure for AI model training. Owns job scheduling, automation, and performance optimization across GPU clusters.

126k – 189kSeattle, WADevOps / SREOn-siteGoSRE

Aurelian

Jun 8

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Huntress

Jun 8

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby

Mozilla

Jun 8

Senior Site Reliability Engineer

Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.

123k – 144kUnited StatesDevOps / SRERemoteAWSIAM

Mozilla

Jun 8

Senior Site Reliability Engineer

Senior SRE to operate and evolve an EKS-based Kubernetes platform, CI/CD pipelines, and observability stack on AWS. Requires 7+ years infrastructure/SRE experience with production Kubernetes and IaC fluency.

123k – 144kUnited StatesDevOps / SRERemoteEKSAWS

Apply