Skip to content

Senior DevOps Engineer/Site Reliability Engineer

Seeking a Senior DevOps/Site Reliability Engineer to build, operate, and scale reliable cloud-native infrastructure and distributed data platforms. This role requires expertise in Kubernetes, cloud infrastructure, observability, automation, CI/CD, and incident management.

165k – 215kUnited StatesDevOps / SRERemote5+ YOE

About the role

Key Responsibilities

  • Administer and maintain Kubernetes clusters and containerized workloads.
  • Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments.
  • Develop and maintain CI/CD pipelines for reliable application deployments.
  • Implement and manage Infrastructure as Code (IaC) using Terraform and Helm.
  • Build automation tooling and operational workflows using Python, Go, or Bash.
  • Drive observability initiatives including monitoring, logging, tracing, and alerting improvements.
  • Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations.
  • Support and optimize distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB.
  • Improve platform reliability, scalability, and operational efficiency using SRE best practices.
  • Collaborate with cross-functional teams across multiple time zones.
  • Perform Linux system administration and networking troubleshooting.
  • Contribute to incident response processes, postmortems, and reliability improvements.
  • Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions.
  • Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence.

Requirements

  • 5+ years of experience in DevOps, SRE, or Platform Engineering roles.
  • Strong expertise with Kubernetes, Docker, and container orchestration.
  • Hands-on experience managing production cloud environments.
  • Strong Infrastructure as Code experience with Terraform and Helm.
  • Experience with CI/CD tools and deployment automation.
  • Advanced troubleshooting skills in Linux systems, networking, and distributed systems.
  • Experience with observability platforms including Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack.
  • Strong programming and scripting skills in Python, Bash, or Go.
  • Experience supporting high-availability production systems and on-call operations.
  • Knowledge of incident management and reliability engineering practices.
  • Familiarity with data platform technologies such as Kafka, Spark, Elasticsearch, Redis, or MongoDB.
  • Understanding of AI-driven operational tooling and automated remediation concepts.
  • Excellent communication, collaboration, and problem-solving skills.
  • Resides on the East Coast

Benefits

We pride ourselves in recognizing our employees. Here are some examples of our benefits program:

  • Pre-IPO Stock Options
  • Medical, Dental & Vision care
  • 401(k)
  • Employee Assistance Program
  • Employee Discount Program
  • Life Insurance
  • Paid time off
  • Referral Program
  • Rewards and Recognition Program

The base compensation range for this role is USD 165,000-215,000 per year. Total compensation includes bonus opportunity and equity, and will vary based on candidate location.

Skills

KubernetesDockerTerraformHelmCI/CDPythonGoBashPrometheusGrafanaKafkaElasticsearchSparkRedisMongoDB

Similar roles

DevOps / SRE jobs

Sr. Platform Engineer I

Designs and builds developer tools, workflows, and CI/CD pipelines to boost engineering productivity across web, mobile, and desktop platforms. Requires 8+ years in platform engineering with expertise in Kubernetes, Docker, IaC, and cloud platforms.

165k – 180kChicago, IL +23DevOps / SREHybrid8+ YOEGoAWS

Lead Engineer, Lithography

Leads lithography process development for quantum qubit manufacturing, owning the wet process stack, optimizing beam-resist interactions, and driving roadmap from R&D to production. Requires PhD with 7+ years or MS with 10+ years in lithography, hands-on expertise in tools and statistical analysis.

165k – 205kFremont, CADevOps / SREOn-site7+ YOESpcDoe

Senior API Integration Engineer

Leads design and delivery of enterprise API integrations using Workato for People Tech ecosystem, automating workflows across ERP, CRM, HCM systems. Requires 7+ years experience with Workato recipes, iPaaS patterns, API security, and stakeholder collaboration.

165k – 200kSunnyvale, CA +1DevOps / SREOn-site7+ YOEXmlJSON

Senior Software Engineer, Platform

Build and scale backend platform systems including real-time EHR integrations, data lakes from TB to PB scale, and core infrastructure for 100x growth. Requires 5+ years in scalable backends, Kubernetes, AWS, PostgreSQL, and DevOps practices.

165k – 240kSan Francisco, CADevOps / SREOn-site5+ YOEAWSDevOps

Senior Platform Engineer

Build and automate platform infrastructure using Python, Terraform, and AWS to reduce developer toil and enhance observability in a scaling startup environment. Advocate for best practices and incident learning with strong automation mindset.

165k – 235kUnited StatesDevOps / SRERemoteAWSRds