# Staff Software Engineer, Infrastructure
**Company:** [Decagon](https://hotfix.jobs/companies/decagon)
**Location:** San Francisco, CA
**Salary:** $300K-$430K
**Experience:** 8+ years
**Skills:** Kubernetes, Terraform, GCP, AWS, Azure, OpenTelemetry, Prometheus, Grafana, Datadog, Pagerduty, GKE, EKS, Aks, GitOps
**Posted:** 2025-12-08
> Designs, builds, and operates high-scale, low-latency production infrastructure services, owning SLOs and end-to-end reliability. Partners with teams to optimize performance, evolve CI/CD, and support diverse deployments; requires 8+ years experience with strong observability and cloud expertise.
## Job Description
## Responsibilities
- Design and implement critical infrastructure services with strong SLOs, clear runbooks, and actionable telemetry.
- Partner with research and product teams to architect solutions, set up prototypes, evaluate performance, and scale new features.
- Tune service latencies: optimize networking paths, apply smart caching/queuing, and tune CPU/memory/I/O for tight p95/p99s.
- Evolve CI/CD, golden paths, and self-service tooling to improve developer velocity and safety.
- Support various deployment architectures for customers with robust observability and upgrade paths.
- Lead infrastructure-as-code (**Terraform**) and GitOps practices; reduce drift with reusable modules and policy-as-code.
- Participate in on-call and drive down toil through automation and elimination of recurring issues.

## Requirements
- 8+ years building and operating production infrastructure at scale.
- Depth in at least one area across Core/Data/AI-ML/Platform/Voice, with curiosity to learn the rest.
- Proven track record meeting high availability and low latency targets (owning SLOs, p95/p99, and load testing).
- Excellent observability chops (**OpenTelemetry**, **Prometheus/Grafana**, **Datadog**) and incident response (**PagerDuty**, SLO/error budgets).
- Clear written communication and the ability to turn ambiguous requirements into simple, reliable designs.

## Nice-to-Haves
- Experience being an early backend/platform/infrastructure engineer at another company.
- Strong **Kubernetes** experience (**GKE/EKS/AKS**) and experience across multiple cloud providers (**GCP**, **AWS**, **Azure**).
- Experience with customer-managed deployments.

## Compensation
- $300K – $430K + equity
**Apply:** https://hotfix.jobs/jobs/staff-software-engineer-infrastructure-at-decagon-6a211fb1-8818-43bd-859d-932a9657c0d3
**Canonical:** https://hotfix.jobs/jobs/staff-software-engineer-infrastructure-at-decagon-6a211fb1-8818-43bd-859d-932a9657c0d3