Senior DevOps Engineer/Site Reliability Engineer

Seeking a Senior DevOps/Site Reliability Engineer to build, operate, and scale reliable cloud-native infrastructure and distributed data platforms. This role requires expertise in Kubernetes, cloud infrastructure, observability, automation, CI/CD, and incident management.

165k – 215kUnited StatesDevOps / SRERemote5+ YOE

Apply

About the role

Key Responsibilities

Administer and maintain Kubernetes clusters and containerized workloads.
Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments.
Develop and maintain CI/CD pipelines for reliable application deployments.
Implement and manage Infrastructure as Code (IaC) using Terraform and Helm.
Build automation tooling and operational workflows using Python, Go, or Bash.
Drive observability initiatives including monitoring, logging, tracing, and alerting improvements.
Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations.
Support and optimize distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB.
Improve platform reliability, scalability, and operational efficiency using SRE best practices.
Collaborate with cross-functional teams across multiple time zones.
Perform Linux system administration and networking troubleshooting.
Contribute to incident response processes, postmortems, and reliability improvements.
Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions.
Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence.

Requirements

5+ years of experience in DevOps, SRE, or Platform Engineering roles.
Strong expertise with Kubernetes, Docker, and container orchestration.
Hands-on experience managing production cloud environments.
Strong Infrastructure as Code experience with Terraform and Helm.
Experience with CI/CD tools and deployment automation.
Advanced troubleshooting skills in Linux systems, networking, and distributed systems.
Experience with observability platforms including Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack.
Strong programming and scripting skills in Python, Bash, or Go.
Experience supporting high-availability production systems and on-call operations.
Knowledge of incident management and reliability engineering practices.
Familiarity with data platform technologies such as Kafka, Spark, Elasticsearch, Redis, or MongoDB.
Understanding of AI-driven operational tooling and automated remediation concepts.
Excellent communication, collaboration, and problem-solving skills.
Resides on the East Coast

Benefits

We pride ourselves in recognizing our employees. Here are some examples of our benefits program:

Pre-IPO Stock Options
Medical, Dental & Vision care
401(k)
Employee Assistance Program
Employee Discount Program
Life Insurance
Paid time off
Referral Program
Rewards and Recognition Program

The base compensation range for this role is USD 165,000-215,000 per year. Total compensation includes bonus opportunity and equity, and will vary based on candidate location.

Skills

KubernetesDockerTerraformHelmCI/CDPythonGoBashPrometheusGrafanaKafkaElasticsearchSparkRedisMongoDB

Similar roles

DevOps / SRE jobs

NinjaTrader

Sr. Platform Engineer I

Designs and builds developer tools, workflows, and CI/CD pipelines to boost engineering productivity across web, mobile, and desktop platforms. Requires 8+ years in platform engineering with expertise in Kubernetes, Docker, IaC, and cloud platforms.

165k – 180kChicago, IL +23DevOps / SREHybrid8+ YOEGoAWS

Rigetti Computing

Lead Engineer, Lithography

Leads lithography process development for quantum qubit manufacturing, owning the wet process stack, optimizing beam-resist interactions, and driving roadmap from R&D to production. Requires PhD with 7+ years or MS with 10+ years in lithography, hands-on expertise in tools and statistical analysis.

165k – 205kFremont, CADevOps / SREOn-site7+ YOESpcDoe

Crusoe

Senior API Integration Engineer

Leads design and delivery of enterprise API integrations using Workato for People Tech ecosystem, automating workflows across ERP, CRM, HCM systems. Requires 7+ years experience with Workato recipes, iPaaS patterns, API security, and stakeholder collaboration.

165k – 200kSunnyvale, CA +1DevOps / SREOn-site7+ YOEXmlJSON

NexHealth

Senior Software Engineer, Platform

Build and scale backend platform systems including real-time EHR integrations, data lakes from TB to PB scale, and core infrastructure for 100x growth. Requires 5+ years in scalable backends, Kubernetes, AWS, PostgreSQL, and DevOps practices.

165k – 240kSan Francisco, CADevOps / SREOn-site5+ YOEAWSDevOps

Jellyfish

Senior Platform Engineer

Build and automate platform infrastructure using Python, Terraform, and AWS to reduce developer toil and enhance observability in a scaling startup environment. Advocate for best practices and incident learning with strong automation mindset.

165k – 235kUnited StatesDevOps / SRERemoteAWSRds