# DevOps Engineer
**Company:** [Phonely](https://hotfix.jobs/companies/phonely)
**Location:** San Francisco, CA
**Salary:** $180K-$260K
**Experience:** 5+ years
**Skills:** Kubernetes, AWS, Terraform, CI/CD, GitOps, Python, FastAPI, Argo CD, Datadog, Redis, Postgres, GCP, Pulumi, CloudFormation, IAM
**Posted:** 2026-04-08
> Designs, builds, and operates reliable cloud infrastructure for real-time voice AI systems. Owns Kubernetes clusters, CI/CD pipelines, observability, and security using AWS and IaC tools. Requires 5+ years DevOps experience with strong Python and async programming skills.
## Job Description
## Responsibilities
- Design, build, and operate highly reliable cloud infrastructure that powers real-time voice AI systems with extremely low latency and high availability.
- Own Kubernetes clusters end-to-end: provisioning, scaling, upgrades, networking, and debugging production incidents under real customer load.
- Build, maintain, and evolve infrastructure as code using tools like **Terraform**, **Pulumi**, or **CloudFormation** to ensure repeatable, auditable, and secure environments across staging and production.
- Create and operate **CI/CD** pipelines that enable fast, safe iteration across multiple microservices and teams.
- Design and maintain observability systems (**metrics**, **logs**, **traces**, **alerting**) to detect failures early and rapidly diagnose production issues.
- Partner with backend engineers to translate application requirements into scalable, secure infrastructure and clean deployment workflows.
- Harden systems through strong security practices including **IAM**, **secrets management**, **network isolation**, and least-privilege access controls.
- Optimize cloud performance and costs while maintaining reliability, developer velocity, and customer experience.
- Implement and operate **GitOps**-driven deployment workflows, using Git as the source of truth for infrastructure and application state, enabling safe, auditable, and automated rollouts.
- Lead incident response: investigate outages, coordinate fixes, write postmortems, and drive systemic reliability improvements.
- Continuously improve resilience through load testing, chaos testing, capacity planning, and proactive infrastructure upgrades.

## Qualifications
- 5+ years as a **DevOps** engineer
- Experience writing async web apps using **FastAPI** in **Python**
- Builder of **APIs**, **Clouds**, **CI/CD** pipelines
- Experience with **IaC**, **AWS**, **Database Management** at scale
- Understanding of good architecture, security practices
- Strong technical and communication skills
- Extensive experience with **AWS** & **Kubernetes**

## Software Stack
- **Backend**: Python, microservices, async programming
- **Cloud & Infrastructure**: AWS, GCP, Kubernetes, Redis, ArgoCD, GitOps
- **Databases**: Firebase, Supabase (PostgreSQL)
- **Frontend**: Next.js
- **Observability & Monitoring**: Datadog, logging, metrics, tracing
- **Telephony & Voice AI**: SIP, voice APIs, real-time call handling
- **Other tools & practices**: CI/CD, automated testing, resilient architecture
**Apply:** https://hotfix.jobs/jobs/devops-engineer-at-phonely-beffbf7c-ad86-401d-bb75-3013653d3cf2
**Canonical:** https://hotfix.jobs/jobs/devops-engineer-at-phonely-beffbf7c-ad86-401d-bb75-3013653d3cf2