Skip to content

Senior Software Engineer, Infrastructure

180k – 250kBoston, MADevOps / SREHybrid5+ YOE
Summary

Senior Infrastructure Engineer responsible for building and operating platform primitives including Kubernetes, CI/CD, observability, and developer tooling at a high-growth AI and data platform company.

About the role

Responsibilities

  • Steward core platform services: Implement container orchestration, service mesh, ingress, and secrets management at scale.
  • Cross-functional partnership: Collaborate with Product, Engineering, Data, and Security to deliver external and internal value.
  • Harden reliability: Improve observability (logging, metrics, tracing), and automated remediation to increase availability and latency.
  • Automate everything: Use infrastructure-as-code and configuration management to make systems and processes repeatable, auditable, and secure.
  • Scale cost-effectively: Optimize cluster utilization, autoscaling, and storage/networking to balance performance, reliability, and spend.
  • Level-up developer experience: Build internal tooling, templates, and golden paths that reduce cognitive load and time-to-first-deploy for product teams.
  • On-call & incident response: Participate in a sustainable on-call rotation, drive post-mortems, eliminate toil, and reduce MTTR via automation.
  • Enable fast, safe delivery: Evolve CI/CD pipelines (build/test/release), and environment strategies (dev/stage/prod).
  • AI: Build using agentic tools (Claude Code, Codex, etc) and push the boundaries of agentic development.

Requirements

  • 5+ years of experience in software engineering with a focus on infrastructure, DevOps, and/or platform engineering.
  • Team focused mindset, with solid collaboration and communication skills, with a focus on enabling others.
  • Pragmatic problem-solver who communicates clearly, documents well, and thrives in fast-moving, high-ownership environments.
  • Experience working with cloud infrastructure, specifically Kubernetes.
  • Understanding of observability: metrics, logs, traces, and building actionable alerts/SLOs.
  • Familiarity with infrastructure-as-code tools.
  • Some programming experience in at least one modern programming language.
  • Awareness of security fundamentals: IAM, workload identity, network policies, encryption, and secrets management.

Nice to Haves

  • Open source contributions.
  • Experience with company transitioning from startup to high-growth.
  • Google Cloud Platform.
  • Terraform.
  • Python, Go, and/or JavaScript (TypeScript).
  • Building and managing CI/CD systems and developer tooling.
Skills
KubernetesTerraformGoogle Cloud PlatformPythonGoJavaScriptTypeScriptCI/CDInfrastructure as CodeObservability
Similar roles at this salary range
All DevOps / SRE jobs →
Plaid

Staff Site Reliability Engineer, Release Engineering

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO
Fivetran

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP
Dropbox

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++
Okta

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Cribl

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3