Site Reliability Engineer (Senior or Staff), Atlas

Senior or Staff Site Reliability Engineer maintains and scales the Atlas platform in a multi-cloud environment, focusing on automation, on-call reliability, and collaboration with engineering teams. Requires 5+ years experience with Linux, cloud providers, and programming languages like Go, Python, or Ruby.

127k – 249kNew York, NYDevOps / SREHybrid5+ YOE

Apply

About the role

Responsibilities

Participate in the development of a reliable and resilient multi-cloud platform that hosts business critical applications for a wide & varied range of customer applications
Collaborate with service-owning teams to provide internal support, solve technical challenges and adapt or build tooling to solve novel use cases in a generic fashion
Participate in a 24/7 on-call rotation to swiftly resolve issues related to any disruption of our customer facing Atlas fleet, ensuring minimal disruption and high availability

Requirements

5+ years of experience running critical systems at scale
Value efficiency in processes and operations, and display a preference for automation over manual processes
Familiar with a major cloud provider (AWS, Azure, or GCP) and possess the ability to build and operate systems in a multi-cloud environment
Strong understanding of how to run a large scale Linux environment, including low level fundamentals
Firm grasp of at least one modern programming language, beyond basic scripting (Go, Ruby, Python)
Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)

Skills

LinuxAWSAzureGCPGoRubyPythonHttpTlsDNSKubernetes

Similar roles

DevOps / SRE jobs

MongoDB

Staff Site Reliability Engineer, Fabric

Staff SRE on the Fabric team builds and maintains secure multi-cloud networking infrastructure for service communication, leveraging deep networking expertise to ensure resilience and scalability. Requires 10+ years experience in distributed systems and networking fundamentals.

127k – 249kNew York, NY +3DevOps / SREHybrid10+ YOEDNSTls

MongoDB

Site Reliability Engineer (Senior or Staff), Infrastructure Security

Senior or Staff Site Reliability Engineer leads design and implementation of cloud security solutions (AWS, Azure, GCP), builds automation for monitoring and alerting, and mentors SRE team. Requires 6+ years SRE/infra experience with security focus, IaC proficiency, and cloud expertise.

127k – 249kAustin, TX +2DevOps / SREHybrid6+ YOEGoAWS

Shield AI

Staff Engineer, Field Quality (R4958)

Staff Engineer investigates field issues in autonomous hardware systems, drives root cause analysis, implements corrective actions, and improves reliability through cross-functional collaboration. Requires 5+ years in quality engineering for complex hardware in aerospace/defense/robotics.

120k – 180kDallas, TXDevOps / SREOn-site5+ YOESQLJira

Airbnb

Audiovisual Infrastructure Engineer

Designs, builds, commissions, and maintains large-scale AV infrastructure for Airbnb's event spaces, conference rooms, and broadcast systems. Conducts R&D on new technologies including AI, troubleshoots complex systems, and collaborates with cross-functional teams. Requires 10+ years AV experience and onsite work in San Francisco.

136k – 160kSan Francisco, CADevOps / SREHybrid10+ YOEQscSdi

Mozilla

Staff Operations Engineer

Staff Operations Engineer leads design, reliability, and evolution of hybrid-cloud and workplace infrastructure. Owns architecture, drives complex projects, mentors engineers, and ensures systems are scalable and secure.

139k – 218kUnited StatesDevOps / SRERemote6+ YOEDNSUnix