Senior Storage Systems Engineer
Senior Storage Systems Engineer manages VAST Data and Pure Storage flash arrays for high-performance AI/HPC workloads, handling administration, performance monitoring, non-disruptive upgrades, data protection, Tier 3 support, and automation. Requires 5+ years storage experience, Linux proficiency, and protocol expertise.
What You'll Be Working On
- Flash Array Administration: Own the end-to-end management of VAST Data (Universal Storage) and Pure Storage (FlashBlade/FlashArray) environments, including initial setup, volume provisioning, and export management.
- Performance Monitoring: Proactively monitor VAST and Pure clusters for IOPS, throughput, and latency bottlenecks, ensuring storage performance stays ahead of GPU demand.
- Non-Disruptive Operations: Execute software upgrades (Purity//FB, VAST OS), expansion of D-Nodes/C-Nodes, and hardware refreshes with zero downtime for our AI customers.
- Data Protection: Manage snapshots, replication policies, and data reduction (deduplication/compression) strategies to optimize TCO while ensuring 100% data durability.
- Tier 3 Support: Act as the lead technical point of contact for storage incidents, working directly with VAST and Pure support engineering to resolve complex fabric or metadata issues.
- Integration & Automation: Use APIs (REST, Python) to automate provisioning and integrate storage health metrics into our centralized observability stack (Grafana/Prometheus).
What You'll Bring to the Team
Technical Experience: 5–8+ years of experience in Storage Administration, with at least 3+ years of hands-on experience managing VAST Data or Pure Storage in a production environment.
Protocol Expertise: Deep understanding of NFS over RDMA, SMB, and NVMe-oF, and how they are implemented within VAST and Pure architectures.
Linux Systems Mastery: Strong command of the Linux CLI, specifically for mounting, tuning, and troubleshooting high-performance file systems.
Network Awareness: Understanding of how storage interacts with InfiniBand and RoCE fabrics to ensure low-latency data delivery to GPU nodes.
Scripting Skills: Proficiency in Python, Bash, or similar for automating volume creation, quota management, and reporting via storage APIs.
Operational Discipline: A meticulous approach to capacity planning and documentation, ensuring the environment remains stable as we add petabytes of scale.
Bonus Points
- Experience with Pure1 or VAST VMS/Insight for predictive analytics and capacity forecasting.
- Familiarity with Slurm or Kubernetes (CSI) integration with high-performance storage.
- Prior experience in a "Large Scale" environment (multi-petabyte footprints).
Benefits
- Competitive compensation and equity packages
- Restricted Stock Units
- Paid time off, paid holidays & leave of absence programs
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off
- Global travel insurance & emergency assistance
- Daily meals allowance
- Additional perks & programs specific to location
Compensation Range
Compensation will be paid in the range of up to $148,500 - $161,000 + Bonus. Restricted Stock Units are included in all offers.
Senior Infrastructure Engineer
Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.
Senior Developer Experience Engineer
Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.
Senior Site Reliability Engineer
Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.