Skip to content

DevOps Engineer

DevOps Engineer responsible for designing, building, and operating AWS infrastructure using Terraform, CI/CD pipelines, and observability tools while collaborating with product and backend teams on reliability and security.

Arlington, VADevOps / SREOnsite

About the role

Responsibilities

  • Design, build, and operate AWS-based infrastructure.
  • Implement and maintain Infrastructure-as-Code for single-tenant and multi-tenant environments using Terraform.
  • Build and maintain deployment and environment automation (Ansible or similar).
  • Own and evolve CI/CD pipelines.
  • Design, implement, and refine observability: metrics, logs, traces, dashboards, and alerting.
  • Partner with application teams on architecture decisions, performance tuning, and operational readiness.
  • Contribute to security and governance: IAM policies, network security, secrets management, and security scanning.
  • Document systems, patterns, and runbooks so others can operate and extend the platform reliably.

Requirements

  • Experience administering infrastructure and operating applications deployed on AWS.
  • Experience using Terraform to manage single-tenant and multi-tenant systems.
  • Strong instincts and practical experience with:
    • IP networking (VPCs, routing, subnets, proxies, DNS).
    • Network security (security groups, NACLs, firewalls).
    • PKI management (TLS certificates, CAs, mTLS, certificate lifecycle).

Nice to Haves

  • Experience with Ansible or another deployment automation/configuration management framework.
  • Experience with GitHub Actions or another CI/CD platform (GitLab CI/CD, CircleCI, etc.).
  • Experience working on reliability projects, such as setting up alert management tools and on-call practices, debugging failures in distributed systems.
  • Experience setting up observability and operations programs, including collecting and representing telemetry in Grafana (dashboards, panels, alerts), instrumenting applications using OpenTelemetry or other log/metric/trace aggregation frameworks.
  • Experience managing PostgreSQL databases in production (backups, migrations, performance, monitoring).
  • Experience managing a pub/sub or queue technology, such as NATS, RabbitMQ, Kafka, AWS SQS, or Google Pub/Sub.
  • Familiarity with secrets management (AWS SSM/Secrets Manager, Vault, or similar).
  • Experience or strong interest in cybersecurity (threat modeling, hardening, secure defaults).
  • Proficiency in Python or another scripting language for tooling and automation.
  • Experience operating a security scanning tool, such as Trivy (or similar vulnerability/container scanners).
  • Experience or interest working with large datasets (performance, storage trade-offs, retention policies).
  • Experience managing graph databases, such as Neo4j, AWS Neptune, or similar.
  • Experience designing or contributing to runbooks and internal platform documentation for non-infrastructure teams.

Benefits

  • Health: Medical, dental, and vision plan options. Life / AD&D, disability coverage options.
  • Family: Paid parental leave for eligible full-time employees (12 weeks for birthing parents, 4 for non-birthing parents, 6 weeks for adoptive, foster, or intended parents through surrogacy).
  • Vacation: Paid holidays and flexible PTO.
  • Retirement: 401(k) with pre-tax and Roth options. HSA/FSA options, dependent care FSA.
  • At the office: Commuter benefits. On-site garage parking. Bike storage. Building fitness center. Desk setup stipend.

Skills

AWSTerraformAnsibleCI/CDGitHub ActionsObservabilityGrafanaOpenTelemetryPostgresPython

Similar roles

DevOps / SRE jobs

Software Engineer, Services Platform

Build platform primitives for service provisioning, deploy tooling, workflow orchestration, and service ownership at a fast-scaling AI coding tool company. Requires experience with durable workflows like Temporal, internal dev platforms, and strong focus on developer experience and reliability.

San Francisco, CA +1DevOps / SREOn-site5+ YOECI/CDTemporal

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud and LLM infrastructure powering RAG, inference, and data pipelines for an aviation AI platform. Requires strong AWS depth, Python data pipelines, and production LLM experience.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOEAWSVpc

Software Engineer, Traffic

Design, build, and operate scalable distributed systems and edge networks on AWS to handle Figma's growing customer traffic and services. Requires 4+ years building infrastructure at scale, experience with TypeScript or Go, and distributed/traffic systems.

153k – 376kSan Francisco, CA +1DevOps / SRERemote4+ YOEGoAWS

Cloud Engineer - Product Metrics

Design, build, and operate petabyte-scale distributed systems for product metrics using Golang, Kubernetes, and ClickHouse. Requires 5+ years building scalable systems and 2+ years with Golang.

141k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Postgres Deployment Engineer

Own stability and deployment of PostgreSQL products. Package software with Nix, manage upgrades, optimize CI/CD, and resolve production issues. Requires 3+ years PostgreSQL experience and Nix proficiency.

United StatesDevOps / SRERemote3+ YOECGo