Skip to content

Senior Software Engineer - Observability and Reliability

170k – 240kNew York, NYOnsite5+ YOE
Summary

Build observability platforms and tools (metrics, logging, tracing, alerting) using Go, OpenTelemetry, and Kubernetes. Requires 5+ years experience building production software and strong CS fundamentals.

About the role

What You Will Be Doing

  • Build observability tools and platforms, including: metrics, logging, distributed tracing, dashboarding, alerting, application performance management
  • Build with modern tools and languages like Go, Open Telemetry and Kubernetes
  • Participate in on-call rotation and ensure uptime of services
  • Create runtime tools/processes that optimize cloud triaging and limit downtime
  • Define best practices around making our systems and services measurable
  • Collaborate with peers and stakeholders through design and code reviews to ensure best practices amongst available technologies. We expect successful candidates to be coding a majority of their time

Qualifications We Need

  • Strong Computer Science fundamentals
  • 5+ years industry experience building and maintaining high-quality software, especially software other engineers use
  • You apply a product mindset to infrastructure systems and feel accomplished enabling others
  • Desire to be a great teammate and have fun at work
  • Strong sense of craftsmanship, and a healthy academic curiosity

Qualifications We Want (also, skills you’ll learn!)

  • Experience building systems for data analytics
  • Distributed systems monitoring and profiling skills
  • Knowledge of cloud application security models
  • Administered cloud service infrastructure (GCP, AWS, Azure)
  • Startup experience

Compensation & Benefits

  • The base salary range for this position is $170k - $240k annually
  • Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience
  • This role is eligible for stock options, as well as a comprehensive benefits package
  • Equity
  • Generous health benefits
  • Flexible time off policy
  • Paid bonding time for all new parents
  • Traditional and Roth 401k
  • Commuter and FSA benefits
  • Lunch Program
  • Dog friendly office
Skills
GoOpenTelemetryKubernetesGCPAWSAzureDistributed SystemsObservabilityMetricsLoggingDistributed TracingAlertingApplication Performance Management
Similar roles at this salary range
All DevOps / SRE jobs →
Komodo Health

Senior Data Engineer, Sentinel (Pacific Time Zone)

Senior Infrastructure Engineer building and operating AWS cloud infrastructure for healthcare data platform. Requires Python, Terraform, CI/CD expertise, and big data tools experience.

153k – 210kUnited StatesDevOps / SRERemote5+ YOEAWSVPC
Pinterest

Sr. Production Engineer, Solutions Engineering

Senior Production Engineer building AI agents, platforms, and automation to ensure reliability of Pinterest's large-scale distributed systems serving hundreds of millions of users.

140k – 288kChicago, IL +1DevOps / SRERemote5+ YOEGoAWS
Datadog

Senior Software Engineer - Observability Visibility

Senior engineer building observability and resilience standards, tooling, and automation to make reliability the default across Datadog services. Requires 5+ years experience, Go/Python skills, and AI feature delivery experience.

175k – 240kNew York, NYDevOps / SREHybrid5+ YOEGoPython
Shield AI

Senior Manager, DevOps Engineering

Lead and mentor a team of DevOps and Infrastructure Engineers responsible for build pipelines, CI/CD systems, developer tooling, and release infrastructure across Hivemind Solutions. Drive modernization of C++/Python build ecosystems and ensure scalable, secure software delivery pipelines.

180k – 280kWashington, DCDevOps / SREOn-site7+ YOENixCMake
Retool

Software Engineer, Developer Experience

Build internal AI tools and autonomous agents that embed into Retool's engineering workflows to boost developer productivity and reduce toil. Requires shipping real AI-powered developer tools and infrastructure.

155k – 315kSan Francisco, CADevOps / SREHybrid5+ YOELLMsAI agents