Skip to content

Senior Software Engineer, Platform

San Francisco, CADevOps / SRERemote5+ YOE
Summary

Senior engineer building and operating Astronomer's high-scale PaaS platform. Owns testing, deployment, reliability, and observability for Astro and related products.

About the role

What you get to do

  • Make high-quality, data-driven and experience-driven decisions on how we build this and the next generation of our production platform, then deliver the results.
  • Own and build how we test, build and deploy code in a high-scale PaaS environment.
  • Collaborate across the whole company on how we design production systems, set standards and make technology choices for new and existing products, and how these fit together.
  • Deliver results - we routinely “change the wheels on the bus while it’s moving”, in a predictable, safe and reliable way.
  • Be at the forefront of how we work together as a Platform Engineering team.
  • Blaze a Trail: Work on a small but growing team on building out the Platform/Reliability practice for the company – this role reports directly to the VP of Reliability.
  • Be an Owner: Be directly involved in decision-making on what we work on, as well as how we work on it. Make promises, and keep them.
  • Do Sensible Things: Be directly involved in determining how our platform works. Participate in incident management and determine sensible practices as the platform evolves.
  • Garage Door Open: Create and maintain comprehensive internal documentation for systems and processes, ensuring clarity and accessibility.

What you bring to the role

  • Strong experience in Non-Abstract Systems design and implementation.
  • Strong proficiency in Python, Golang and in-depth experience with Kubernetes (CKA or equivalent or greater).
  • Experience with observability principles and technologies, including SLI/SLO definition and tracking.
  • Strong communication skills, both written and verbal, with experience in working with a globally distributed team in delivery.
  • A passion for reliability and operational excellence. A low tolerance for toil and other nonsense.
  • Ability to estimate the scope of work accurately and coordinate with stakeholders to address risks and ensure successful project delivery.
  • Experience with (and ideally strong opinions on) software development best practices, such as code review, testing, CI/CD, version control, automation and debugging.
  • Proactive approach to identifying and addressing issues, with a focus on ownership and accountability.

Bonus points if you have

  • Experience working on a SaaS/PaaS product across multiple cloud providers.
  • Experience with our particular tech stack components and technologies: CircleCI, Chronosphere (Prometheus), Splunk, Bazel, Istio, Playwright, Karpenter, Github Actions.
  • Experience of the innards and quirks of AWS, GCP and (particularly) Azure.
  • Participated in an on-call rotation - this role involves periodic on-call for the services we own.
  • Experience with Apache Airflow.
Skills
PythonGolangKubernetesCI/CDObservabilitySLI/SLOAWSGCPAzureCircleCIPrometheusSplunkBazelIstioPlaywright