Senior Infrastructure Software Engineer, Storage Core
180k – 274kUnited StatesRemote9+ YOE
Summary
Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.
About the role
Responsibilities
- Design, implement, and maintain large-scale distributed storage systems that ensure data durability, availability, and performance.
- Collaborate with peers to evolve the architecture of Dropbox’s core storage infrastructure for improved scalability and efficiency.
- Contribute to the design of replication, erasure coding, and system lifecycle management systems that balance cost, reliability, and performance.
- Write high-quality, performant, and maintainable code in Go and Rust.
- Participate in the on-call rotation, gaining firsthand experience operating Dropbox’s production storage systems.
- Investigate and resolve complex production issues, performing root cause analysis and driving continuous reliability improvements.
- Partner with cross-functional teams (Networking, Hardware, Capacity Planning) to deliver end-to-end reliable and cost-efficient storage solutions.
- Take ownership of scoped projects and demonstrate growth toward leading larger, cross-team technical initiatives.
Requirements
- 9+ years of strong understanding of distributed systems principles, including replication, consistency, and fault tolerance.
- Experience developing and debugging production services in C++, Go, or Rust.
- Familiarity with distributed storage systems, file systems, or data infrastructure at scale.
- Demonstrated ability to write efficient, reliable, and maintainable code in mission-critical environments.
- Experience troubleshooting complex systems and participating in on-call or operational rotations.
- Solid communication and collaboration skills, with the ability to work across infrastructure and product teams.
- Eagerness to learn, grow, and contribute to multi-year infrastructure evolution initiatives.
Preferred Qualifications
- Experience building and operating large-scale object storage or distributed storage systems (e.g. S3, Ceph, GFS/Colossus).
- Deep interest in systems performance, profiling, and low-level optimization.
- Familiarity with replication protocols, erasure coding, and data placement algorithms.
- Experience with production monitoring, observability, and incident response workflows.
- Contributions to infrastructure projects, open-source systems, or developer tooling that improved reliability and performance.
Skills
GoRustC++Distributed SystemsReplicationErasure CodingObject StorageCephPerformance ProfilingObservability
Similar roles at this salary range
All DevOps / SRE jobs →Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Senior Platform Reliability Engineer
Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.
182k – 250kSan Francisco, CA +2DevOps / SREHybrid6+ YOEAWSEKS