Skip to content

Senior Software Engineer, Infra - Compute Platform

186k – 219kUnited StatesRemote5+ YOE
Summary

Senior engineer owning Kubernetes-based compute orchestration platform. Builds tooling, automation, and AI-driven workflows to improve reliability and developer experience across Coinbase services.

About the role

Responsibilities

  • Own the design, build, and operation of Kubernetes cluster management tooling and automation that keeps the compute platform reliable and self-healing at scale.
  • Build developer-facing tooling and workflows that improve how engineers interact with Kubernetes, with a heavy emphasis on integrating AI-driven processes and support.
  • Deliver net-new compute capabilities for service owners, such as one-off jobs, cron scheduling, deployment strategies, EFS support, and automated right-sizing.
  • Drive operational excellence by automating toil, reducing on-call burden, and continuously improving platform observability and incident response.
  • Partner with Security, Reliability, and Observability teams to ensure the compute platform meets standards for security, uptime, and performance.

Requirements

  • 5+ years of software engineering experience, including 3+ years building and operating Kubernetes or similar compute orchestration systems (e.g., Mesos, Nomad, ECS).
  • Hands-on experience with AWS and/or GCP infrastructure services (e.g., EC2, EKS, IAM, VPC, networking) in a production environment at scale.
  • Demonstrated ability to design, implement, and operate distributed infrastructure systems, including diagnosing complex failures and driving them to root-cause resolution.
  • Hands-on experience with the CNCF ecosystem (e.g., Helm, Prometheus, ArgoCD, Envoy) and a track record of applying these tools to solve real infrastructure problems.
  • Proven ability to apply AI tooling to infrastructure workflows, improving automation, developer productivity, or operational efficiency.
  • Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.
Skills
KubernetesAWSGCPHelmPrometheusArgoCDEnvoyEC2EKSIAM
Similar roles at this salary range
All DevOps / SRE jobs →
Alembic

Senior Network & Site Reliability Engineer

Design, operate, and automate the global network and reliability layer for a high-performance NVIDIA DGX SuperPOD supporting ML workloads. Own architecture, observability, incident response, and security for mission-critical infrastructure.

210k – 240kSan Francisco, CADevOps / SREOn-site8+ YOEBGPVPN
Komodo Health

Senior Data Engineer, Sentinel (Pacific Time Zone)

Senior Infrastructure Engineer building and operating AWS cloud infrastructure for healthcare data platform. Requires Python, Terraform, CI/CD expertise, and big data tools experience.

153k – 210kUnited StatesDevOps / SRERemote5+ YOEAWSVPC
Datadog

Senior Software Engineer - Observability Visibility

Senior engineer building observability and resilience standards, tooling, and automation to make reliability the default across Datadog services. Requires 5+ years experience, Go/Python skills, and AI feature delivery experience.

175k – 240kNew York, NYDevOps / SREHybrid5+ YOEGoPython
Shield AI

Senior Manager, DevOps Engineering

Lead and mentor a team of DevOps and Infrastructure Engineers responsible for build pipelines, CI/CD systems, developer tooling, and release infrastructure across Hivemind Solutions. Drive modernization of C++/Python build ecosystems and ensure scalable, secure software delivery pipelines.

180k – 280kWashington, DCDevOps / SREOn-site7+ YOENixCMake
Retool

Software Engineer, Developer Experience

Build internal AI tools and autonomous agents that embed into Retool's engineering workflows to boost developer productivity and reduce toil. Requires shipping real AI-powered developer tools and infrastructure.

155k – 315kSan Francisco, CADevOps / SREHybrid5+ YOELLMsAI agents