Skip to content

Platform Operations Engineer

Builds and scales platform infrastructure on AWS EKS with GitOps via ArgoCD, manages CI/CD with GitHub Actions, drives observability using Datadog/Sentry/CloudWatch, and ensures reliability through SLOs and incident response. Requires 3+ years SRE/DevOps experience and Kubernetes expertise.

153k – 170kSan Diego, CADevOps / SRERemote3+ YOE

About the role

Responsibilities

  • Support the Platform Infrastructure - Help manage and scale our container environment on Amazon EKS, implement GitOps workflows using ArgoCD, and maintain CI/CD pipelines through GitHub Actions to ensure that deployments are fast, consistent, and automated
  • Build for Reliability - Define and track SLIs and SLOs, lead incident response including on-call rotations, root cause analysis, and post-mortems, and contribute to disaster recovery planning to keep our systems highly available
  • Drive Observability - Design and maintain our monitoring and logging stack using Datadog, Sentry, and CloudWatch — giving engineering teams clear visibility into system health and performance before problems reach users
  • Shape the Platform's Future - Collaborate on architectural decisions, build internal tooling and self-service workflows that make the platform easier to operate, and contribute meaningfully to how we scale and evolve our infrastructure

Requirements

  • 3+ years in SRE, DevOps, or Cloud Infrastructure
  • Confident working with core AWS services (VPC, IAM, EKS, RDS) and a strong understanding of cloud networking and security best practices
  • Expert in using Infrastructure as code with Terraform, CloudFormation, or Crossplane
  • Proficient with GitHub and GitHub Actions as a core component of your CI/CD and automation pipelines - not just for source control
  • Experienced with running Kubernetes clusters in production and managing application deployments through GitOps workflows (ArgoCD/Flux) and Helm Charts
  • Proficient with observability tooling such as Datadog, Sentry, CloudWatch, Grafana to include building alerts, dashboards, and log pipelines
  • Experience writing solid Python scripts to glue systems together, automate infrastructure tasks, or handle custom workflows
  • Comfortable working independently in a remote setup, asking questions when needed, and keeping momentum without being micromanaged
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience

Nice to Haves

  • Certifications: AWS, Kubernetes, Terraform or Python

Benefits

  • Competitive pay with equity options
  • Stellar health care plan options (Medical, Dental & Vision), with FSA, DCFSA, & HSA options
  • Company-sponsored disability & life insurance
  • Unlimited PTO
  • 401(k) + 4% Matching
  • Fully remote work + flexible working hours
  • $750 work-from-home setup budget
  • Paid biannual in-person company summits
  • Quarterly $150 co-hanging stipend to meet up with coworkers
  • Monthly $100 health and wellness benefit
  • Generous paid family leave
  • Annual $1,200 learning & development stipend

Skills

AWSEKSKubernetesTerraformArgo CDGitHub ActionsDatadogSentryCloudWatchPythonGitOpsHelmGrafana

Similar roles

DevOps / SRE jobs

Software Engineer, Traffic

Design, build, and operate scalable distributed systems and edge networks on AWS to handle Figma's growing customer traffic and services. Requires 4+ years building infrastructure at scale, experience with TypeScript or Go, and distributed/traffic systems.

153k – 376kSan Francisco, CA +1DevOps / SRERemote4+ YOEGoAWS

CloudOps Engineer

Design, build, and automate secure AWS cloud-native infrastructure with Kubernetes and Terraform. Enable dev teams with self-service platforms, CI/CD pipelines, and SRE best practices.

152k – 200kNew York, NYDevOps / SREOn-site3+ YOEGoAWS

Software Engineer, Performance Tooling and Infrastructure

Builds and maintains performance simulation platform with bench-top rigs, cloud orchestration, and data pipelines to validate autonomy code changes for real-time performance on robot hardware. Requires 3+ years experience in Python/C++, Linux systems, data engineering, with technical leadership.

152k – 228kMountain View, CADevOps / SREOn-site3+ YOEC++GCP

AI Enablement Engineer

Design and build AI automations, integrations, and agents that connect company systems and eliminate manual work. Requires 4+ years building production automations and LLMs, strong API skills, and experience with RAG, agents, and observability.

154k – 171kMcLean, VA +1DevOps / SREOn-site4+ YOEGoRAG

AI Enablement Engineer

Design and build AI automations, integrations, and agents that connect company systems and eliminate manual work. Requires 4+ years building production automations and LLMs, strong API skills, and experience with RAG, agents, and observability.

154k – 171kMcLean, VA +1DevOps / SREOn-site4+ YOEGoRAG