Staff Software Engineer (Technical Lead), Storage
204k – 255kUnited StatesRemote9+ YOE
Summary
Staff-level infrastructure engineer leading teams that build and operate Airbnb's critical KV stores, caching layers, coordination services, and data ingestion pipelines at massive scale.
About the role
Responsibilities
- Own and operate a highly available, low-latency, distributed, multi-tenant KV store supporting millions of read QPS and 99.9+% availability.
- Manage control planes and clients for ElasticCache clusters handling million+ IOPS and indexing QPS.
- Operate a scalable, reliable, performant distributed coordination service supporting MySQL, Redis, Kafka, Flink, Druid, Zookeeper, and other systems.
- Build and operate managed data export solutions including near real-time CDC and periodic mutation/full table snapshots.
- Lead a team of developers to deliver multi-quarter cross-functional projects.
- Stay current with data ingestion systems and evaluate/incorporate new technologies to improve architecture.
- Influence team and organizational long-term roadmap and strategy.
- Mentor and coach team members to enhance skills and technical standards.
- Raise operational standards by proactively identifying, debugging, and fixing issues; participate in on-call rotation.
Requirements
- 9+ years of relevant industry experience.
- Proven track record of leading and mentoring engineering teams, setting technical direction, and growing engineers.
- Deep expertise in distributed systems, multi-tenant storage, and infrastructure; experience architecting and scaling high-performance, business-critical systems.
- Demonstrated ability to collaborate and influence across teams, building alignment on technical strategy.
- Strong judgment on technical trade-offs balancing short-term delivery with long-term maintainability.
- Experience onboarding to and navigating complex codebases and enabling others to do the same.
Skills
Distributed SystemsKV StoresCachingElasticCacheRedisKafkaMySQLFlinkDruidZookeeperCDCData IngestionInfrastructureMulti-tenant Storage
Similar roles at this salary range
All DevOps / SRE jobs →Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Senior Platform Reliability Engineer
Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.
182k – 250kSan Francisco, CA +2DevOps / SREHybrid6+ YOEAWSEKS