Skip to content

Staff Software Engineer, Infrastructure

200k – 300kSan Francisco, CAHybrid7+ YOE
Summary

Hands-on Infrastructure Tech Lead building and scaling AWS cloud infrastructure from scratch for an AI-driven enterprise analytics platform. Owns architecture, IaC, security/compliance (SOC 2), and operational excellence.

About the role

What you'll do

  • Hands-on platform building. Architect and implement foundational cloud infrastructure from scratch: compute, networking, CI/CD, and observability. Actively writing infrastructure-as-code and shipping production systems.
  • Own the infrastructure architecture. Define and execute the technical vision and multi-year roadmap for our AWS-based platform (ECS/Fargate, containerized Python and Node services).
  • Run durable, AI-heavy workloads. Operate and scale our workflow orchestration layer (Temporal), streaming pipelines, vector search infrastructure, and high-throughput LLM inference paths.
  • Design for security and compliance. Build infrastructure that meets SOC 2 requirements from day one: multi-tenant isolation, secrets management, least-privilege IAM, audit logging, and encrypted data flows.
  • Establish operational excellence. Set and uphold standards for IaC, deployment pipelines, incident response, SLOs, and on-call practices; mentor engineers.
  • Cross-functional collaboration. Partner with product, backend, frontend, and enterprise customers to translate requirements into pragmatic infrastructure solutions.

What you bring

  • 7+ years building and scaling mission-critical cloud infrastructure on AWS and/or GCP, with demonstrated experience architecting platforms from the ground up.
  • Production experience with ECS/Fargate, Kubernetes, or equivalent: including service networking, autoscaling, zero-downtime deploys, and multi-environment release strategies.
  • Strong command of Terraform, Pulumi, or CloudFormation, plus CI/CD pipeline design (GitHub Actions or similar) and GitOps workflows.
  • Familiarity operating Postgres at scale (RDS, Supabase, or self-managed), Redis, message/workflow systems (Temporal, SQS, Kafka), and ideally vector databases or LLM serving infrastructure.
  • Practical experience with SOC 2 (or similar) compliance programs, IAM design, VPC architecture, secrets management, and multi-tenant data isolation.
  • Execution-driven mindset with end-to-end ownership of systems.
Skills
AWSGCPECSFargateKubernetesTerraformPulumiCloudFormationGitHub ActionsGitOpsPostgresRedisTemporalSQSKafka
Similar roles at this salary range
All DevOps / SRE jobs →
Alembic

Senior Network & Site Reliability Engineer

Design, operate, and automate the global network and reliability layer for a high-performance NVIDIA DGX SuperPOD supporting ML workloads. Own architecture, observability, incident response, and security for mission-critical infrastructure.

210k – 240kSan Francisco, CADevOps / SREOn-site8+ YOEBGPVPN
Datadog

Senior Software Engineer - Observability Visibility

Senior engineer building observability and resilience standards, tooling, and automation to make reliability the default across Datadog services. Requires 5+ years experience, Go/Python skills, and AI feature delivery experience.

175k – 240kNew York, NYDevOps / SREHybrid5+ YOEGoPython
Shield AI

Senior Manager, DevOps Engineering

Lead and mentor a team of DevOps and Infrastructure Engineers responsible for build pipelines, CI/CD systems, developer tooling, and release infrastructure across Hivemind Solutions. Drive modernization of C++/Python build ecosystems and ensure scalable, secure software delivery pipelines.

180k – 280kWashington, DCDevOps / SREOn-site7+ YOENixCMake
Coinbase

Staff Software Engineer

Staff Software Engineer owning technical strategy and systems for Coinbase's test infrastructure at scale. Focus on fast, reliable test signals through orchestration, smart selection, sharding, and flakiness remediation.

218k – 257kUnited StatesDevOps / SRERemote10+ YOEGoAWS
Hightouch

Staff Engineer, AI Productivity

Staff-level engineer building infrastructure, tooling, and documentation to make AI coding agents dramatically more productive across the codebase. Owns agentic dev environments, MCP integrations, and agent context.

180k – 400kUnited StatesDevOps / SRERemote7+ YOEGoDevin