Staff Software Engineer, Platform Infrastructure
215k – 250kNew York, NYSan Francisco, CADevOps / SREHybrid
Summary
Staff engineer builds and owns scalable multi-cloud platform infrastructure for Astronomer's DataOps products. Requires deep Kubernetes/Go expertise, distributed systems knowledge, and multi-cloud experience to ensure reliability at enterprise scale.
About the role
Responsibilities
- Own and develop platform infrastructure strategy, map out needs, make decisions, and own outcomes.
- Decide what to work on and how, make promises, and deliver.
- Conduct principled build vs. buy assessments and advocate for appropriate tools.
- Create and maintain comprehensive internal documentation and decision records.
- Participate in architectural forums and make open, principled decisions.
Requirements
- Depth in distributed systems, understanding failure modes, consistency/availability tradeoffs, backpressure, and graceful degradation.
- Kubernetes expertise at operator level, including scheduler and control loop under load.
- Strong proficiency in Go for building production systems.
- Multi-cloud experience (AWS, GCP, Azure) with architectural decisions in production.
- Experience defining requirements and driving technology choices across engineering organization.
- Strong written and verbal communication for design docs, postmortems, and global teams.
Nice-to-Haves
- Experience with storage primitives (relational vs. object stores).
- Work on SaaS/PaaS products across multiple clouds.
- Familiarity with Apache Airflow or workflow orchestration.
Compensation
- Estimated salary: $215,000 - $250,000 based on leveling and geography, plus equity and comprehensive benefits.
Skills
KubernetesGoDistributed SystemsMulti-cloudAWSGCPAzureApache Airflow
Similar roles at this salary range
All DevOps / SRE jobs →Staff Site Reliability Engineer, Release Engineering
Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.
208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO
Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE