# Senior DevOps Engineer/Site Reliability Engineer
**Company:** [Stellar Cyber](https://hotfix.jobs/companies/stellar-cyber)
**Location:** Remote
**Salary:** $165K-$215K
**Experience:** 5+ years
**Skills:** Kubernetes, Docker, Terraform, Helm, CI/CD, Python, Go, Bash, Prometheus, Grafana, Kafka, Elasticsearch, Spark, Redis, MongoDB
**Posted:** 2026-06-02
> Seeking a Senior DevOps/Site Reliability Engineer to build, operate, and scale reliable cloud-native infrastructure and distributed data platforms. This role requires expertise in Kubernetes, cloud infrastructure, observability, automation, CI/CD, and incident management.
## Job Description
## Key Responsibilities
* Administer and maintain Kubernetes clusters and containerized workloads.
* Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments.
* Develop and maintain CI/CD pipelines for reliable application deployments.
* Implement and manage Infrastructure as Code (IaC) using Terraform and Helm.
* Build automation tooling and operational workflows using Python, Go, or Bash.
* Drive observability initiatives including monitoring, logging, tracing, and alerting improvements.
* Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations.
* Support and optimize distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB.
* Improve platform reliability, scalability, and operational efficiency using SRE best practices.
* Collaborate with cross-functional teams across multiple time zones.
* Perform Linux system administration and networking troubleshooting.
* Contribute to incident response processes, postmortems, and reliability improvements.
* Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions.
* Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence.

## Requirements
* 5+ years of experience in DevOps, SRE, or Platform Engineering roles.
* Strong expertise with Kubernetes, Docker, and container orchestration.
* Hands-on experience managing production cloud environments.
* Strong Infrastructure as Code experience with Terraform and Helm.
* Experience with CI/CD tools and deployment automation.
* Advanced troubleshooting skills in Linux systems, networking, and distributed systems.
* Experience with observability platforms including Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack.
* Strong programming and scripting skills in Python, Bash, or Go.
* Experience supporting high-availability production systems and on-call operations.
* Knowledge of incident management and reliability engineering practices.
* Familiarity with data platform technologies such as Kafka, Spark, Elasticsearch, Redis, or MongoDB.
* Understanding of AI-driven operational tooling and automated remediation concepts.
* Excellent communication, collaboration, and problem-solving skills.
* Resides on the East Coast

## Benefits
We pride ourselves in recognizing our employees. Here are some examples of our benefits program:
* Pre-IPO Stock Options
* Medical, Dental & Vision care
* 401(k)
* Employee Assistance Program
* Employee Discount Program
* Life Insurance
* Paid time off
* Referral Program
* Rewards and Recognition Program

The base compensation range for this role is USD 165,000-215,000 per year. Total compensation includes bonus opportunity and equity, and will vary based on candidate location.
**Apply:** https://hotfix.jobs/jobs/senior-devops-engineer-site-reliability-engineer-at-stellar-cyber-4656dca5-294a-4454-9c42-c312ea2bc535
**Canonical:** https://hotfix.jobs/jobs/senior-devops-engineer-site-reliability-engineer-at-stellar-cyber-4656dca5-294a-4454-9c42-c312ea2bc535