Senior Software Engineer, Infra - Compute Platform

186k – 219kUnited StatesRemote5+ YOEJun 10

Summary

Senior engineer owning Kubernetes-based compute orchestration platform. Builds tooling, automation, and AI-driven workflows to improve reliability and developer experience across Coinbase services.

About the role

Responsibilities

Own the design, build, and operation of Kubernetes cluster management tooling and automation that keeps the compute platform reliable and self-healing at scale.
Build developer-facing tooling and workflows that improve how engineers interact with Kubernetes, with a heavy emphasis on integrating AI-driven processes and support.
Deliver net-new compute capabilities for service owners, such as one-off jobs, cron scheduling, deployment strategies, EFS support, and automated right-sizing.
Drive operational excellence by automating toil, reducing on-call burden, and continuously improving platform observability and incident response.
Partner with Security, Reliability, and Observability teams to ensure the compute platform meets standards for security, uptime, and performance.

Requirements

5+ years of software engineering experience, including 3+ years building and operating Kubernetes or similar compute orchestration systems (e.g., Mesos, Nomad, ECS).
Hands-on experience with AWS and/or GCP infrastructure services (e.g., EC2, EKS, IAM, VPC, networking) in a production environment at scale.
Demonstrated ability to design, implement, and operate distributed infrastructure systems, including diagnosing complex failures and driving them to root-cause resolution.
Hands-on experience with the CNCF ecosystem (e.g., Helm, Prometheus, ArgoCD, Envoy) and a track record of applying these tools to solve real infrastructure problems.
Proven ability to apply AI tooling to infrastructure workflows, improving automation, developer productivity, or operational efficiency.
Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.

Skills

KubernetesAWSGCPHelmPrometheusArgoCDEnvoyEC2EKSIAM

Similar roles at this salary range

All DevOps / SRE jobs →

Alembic

Jun 12

Senior Network & Site Reliability Engineer

Design, operate, and automate the global network and reliability layer for a high-performance NVIDIA DGX SuperPOD supporting ML workloads. Own architecture, observability, incident response, and security for mission-critical infrastructure.

210k – 240kSan Francisco, CADevOps / SREOn-site8+ YOEBGPVPN

Komodo Health

Jun 12

Senior Data Engineer, Sentinel (Pacific Time Zone)

Senior Infrastructure Engineer building and operating AWS cloud infrastructure for healthcare data platform. Requires Python, Terraform, CI/CD expertise, and big data tools experience.

153k – 210kUnited StatesDevOps / SRERemote5+ YOEAWSVPC

Datadog

Jun 12

Senior Software Engineer - Observability Visibility

Senior engineer building observability and resilience standards, tooling, and automation to make reliability the default across Datadog services. Requires 5+ years experience, Go/Python skills, and AI feature delivery experience.

175k – 240kNew York, NYDevOps / SREHybrid5+ YOEGoPython

Shield AI

Jun 12

Senior Manager, DevOps Engineering

Lead and mentor a team of DevOps and Infrastructure Engineers responsible for build pipelines, CI/CD systems, developer tooling, and release infrastructure across Hivemind Solutions. Drive modernization of C++/Python build ecosystems and ensure scalable, secure software delivery pipelines.

180k – 280kWashington, DCDevOps / SREOn-site7+ YOENixCMake

Retool

Jun 11

Software Engineer, Developer Experience

Build internal AI tools and autonomous agents that embed into Retool's engineering workflows to boost developer productivity and reduce toil. Requires shipping real AI-powered developer tools and infrastructure.

155k – 315kSan Francisco, CADevOps / SREHybrid5+ YOELLMsAI agents

Apply