Senior Software Engineer, Infrastructure

180k – 250kBoston, MADevOps / SREHybrid5+ YOEJun 3

Summary

Senior Infrastructure Engineer responsible for building and operating platform primitives including Kubernetes, CI/CD, observability, and developer tooling at a high-growth AI and data platform company.

About the role

Responsibilities

Steward core platform services: Implement container orchestration, service mesh, ingress, and secrets management at scale.
Cross-functional partnership: Collaborate with Product, Engineering, Data, and Security to deliver external and internal value.
Harden reliability: Improve observability (logging, metrics, tracing), and automated remediation to increase availability and latency.
Automate everything: Use infrastructure-as-code and configuration management to make systems and processes repeatable, auditable, and secure.
Scale cost-effectively: Optimize cluster utilization, autoscaling, and storage/networking to balance performance, reliability, and spend.
Level-up developer experience: Build internal tooling, templates, and golden paths that reduce cognitive load and time-to-first-deploy for product teams.
On-call & incident response: Participate in a sustainable on-call rotation, drive post-mortems, eliminate toil, and reduce MTTR via automation.
Enable fast, safe delivery: Evolve CI/CD pipelines (build/test/release), and environment strategies (dev/stage/prod).
AI: Build using agentic tools (Claude Code, Codex, etc) and push the boundaries of agentic development.

Requirements

5+ years of experience in software engineering with a focus on infrastructure, DevOps, and/or platform engineering.
Team focused mindset, with solid collaboration and communication skills, with a focus on enabling others.
Pragmatic problem-solver who communicates clearly, documents well, and thrives in fast-moving, high-ownership environments.
Experience working with cloud infrastructure, specifically Kubernetes.
Understanding of observability: metrics, logs, traces, and building actionable alerts/SLOs.
Familiarity with infrastructure-as-code tools.
Some programming experience in at least one modern programming language.
Awareness of security fundamentals: IAM, workload identity, network policies, encryption, and secrets management.

Nice to Haves

Open source contributions.
Experience with company transitioning from startup to high-growth.
Google Cloud Platform.
Terraform.
Python, Go, and/or JavaScript (TypeScript).
Building and managing CI/CD systems and developer tooling.

Skills

KubernetesTerraformGoogle Cloud PlatformPythonGoJavaScriptTypeScriptCI/CDInfrastructure as CodeObservability

Similar roles at this salary range

All DevOps / SRE jobs →

Plaid

Jun 19

Staff Site Reliability Engineer, Release Engineering

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO

Fivetran

Jun 18

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP

Dropbox

Jun 18

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++

Okta

Jun 17

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE

Cribl

Jun 17

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3

Apply