Infrastructure Engineer

Build and operate cloud-native infrastructure on GCP and Kubernetes. Own CI/CD, Terraform, observability, and security for a fast-growing onchain finance platform.

New York, NYDevOps / SRERemote

Apply

About the role

What you'll do

Support the application teams: turn around infra requests (permissions, roles, service setup, project peering) so product engineers stay focused on shipping.
Own CI/CD and deployments: maintain and extend our GitHub Actions workflows and help migrate toward a dedicated CD tool with proper permissioning — the goal is fully automated, locked-down deploys via service accounts, no direct engineer access to production.
Build and maintain infrastructure as code: author and update Terraform modules for new and existing services across GCP environments.
Run Kubernetes the right way: manage service deployments via Helm (we're on Helm 4) keep async workloads healthy on Dagster.
Unify observability (likely first project): consolidate today's per-team alerting into a single view — system-to-system dashboards plus incident alerting that routes upstream service/vendor failures to the right impacted teams and on-call rotations.
Advance resilience: help move us toward a fully region- and cloud-agnostic posture so services can pick up and move if something fails.
Strengthen security & access: apply IAM, secrets management, least privilege, and auditability; contribute to SOC 2 readiness.
Automate with AI: build agent skills / agents.md so routine tasks (provisioning access, simple changes) can be handled by an agent instead of human engineering hours, and use AI to reason through bigger problems.

What you bring

Strong software-engineering fundamentals in at least one production language (Python, Go, TypeScript, or Rust); Python especially valued, plus comfort scripting and working in the shell.
Hands-on experience with cloud infrastructure and core cloud services, especially GCP (AWS/Azure transferable).
Experience operating large-scale Kubernetes production systems.
Experience with Infrastructure as Code, especially Terraform.
Familiarity with CI/CD systems, especially GitHub Actions or Octopus Deploy.
Ability to debug production issues using logs, metrics, traces, shell tools, and source code.
Security and access-control fundamentals: IAM, secrets management, least privilege, and auditability.
Clear written communication around incidents, design decisions, and operational procedures.

Bonus points

Supporting SOC 2 controls - evidence collection, access reviews, change management, or audit readiness.
Observability with Datadog, Prometheus, Grafana, OpenTelemetry, Honeycomb, or similar.
Improving developer experience through internal tooling, templates, scripts, or platform APIs.
Incident response experience, including postmortems and follow-up remediation.
Experience with Dagster, Helm 3+, high-scale CD tooling (Bazel, Octopus), or AI/agent-assisted ops.
Basic web3 / DeFi literacy (transactions, wallets) and genuine curiosity about onchain.

Skills

PythonGoTypeScriptRustGCPKubernetesTerraformGitHub ActionsHelmDagsterIAMSOC 2DatadogPrometheusGrafana

Similar roles

DevOps / SRE jobs

Cursor

Software Engineer, Services Platform

Build platform primitives for service provisioning, deploy tooling, workflow orchestration, and service ownership at a fast-scaling AI coding tool company. Requires experience with durable workflows like Temporal, internal dev platforms, and strong focus on developer experience and reliability.

San Francisco, CA +1DevOps / SREOn-site5+ YOECI/CDTemporal

Beacon AI

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud and LLM infrastructure powering RAG, inference, and data pipelines for an aviation AI platform. Requires strong AWS depth, Python data pipelines, and production LLM experience.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOEAWSVpc

Figma

Software Engineer, Traffic

Design, build, and operate scalable distributed systems and edge networks on AWS to handle Figma's growing customer traffic and services. Requires 4+ years building infrastructure at scale, experience with TypeScript or Go, and distributed/traffic systems.

153k – 376kSan Francisco, CA +1DevOps / SRERemote4+ YOEGoAWS

Clickhouse

Cloud Engineer - Product Metrics

Design, build, and operate petabyte-scale distributed systems for product metrics using Golang, Kubernetes, and ClickHouse. Requires 5+ years building scalable systems and 2+ years with Golang.

141k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Supabase

Postgres Deployment Engineer

Own stability and deployment of PostgreSQL products. Package software with Nix, manage upgrades, optimize CI/CD, and resolve production issues. Requires 3+ years PostgreSQL experience and Nix proficiency.

United StatesDevOps / SRERemote3+ YOECGo