Skip to content

Software Engineer: Backend & Infrastructure

Builds and maintains secure, scalable infrastructure for AI-powered agents, deploys ML workloads, manages cloud environments, IaC, CI/CD, and ensures observability and security on AWS/Azure/GCP/Kubernetes.

San Francisco, CADevOps / SREOnsite

About the role

Responsibilities

  • Designing and maintaining secure, scalable infrastructure for AI-powered agents in production
  • Deploying and optimizing AI-heavy services with high availability and performance requirements
  • Managing infrastructure as code, cloud environments, and CI/CD pipelines
  • Implementing monitoring, observability, and alerting to ensure system reliability
  • Contributing to infrastructure security and best practices

Requirements

  • Experience deploying and productionizing ML or AI-heavy workloads
  • Expertise in building secure, scalable systems on platforms like AWS, Azure, or GCP
  • Deep knowledge of backend systems, networking, and container orchestration (e.g., Kubernetes)
  • Familiarity with infrastructure security principles and compliance (e.g., SOC2)
  • Mindset of an owner: proactive, hands-on, and driven to solve problems end-to-end

Compensation

  • Competitive compensation
  • Significant equity upside
  • Medical / dental / vision

Skills

AWSAzureGCPKubernetesInfrastructure As CodeCI/CDMonitoringObservabilityBackend SystemsNetworking

Similar roles

DevOps / SRE jobs

Software Engineer, Services Platform

Build platform primitives for service provisioning, deploy tooling, workflow orchestration, and service ownership at a fast-scaling AI coding tool company. Requires experience with durable workflows like Temporal, internal dev platforms, and strong focus on developer experience and reliability.

San Francisco, CA +1DevOps / SREOn-site5+ YOECI/CDTemporal

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud and LLM infrastructure powering RAG, inference, and data pipelines for an aviation AI platform. Requires strong AWS depth, Python data pipelines, and production LLM experience.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOEAWSVpc

Software Engineer, Traffic

Design, build, and operate scalable distributed systems and edge networks on AWS to handle Figma's growing customer traffic and services. Requires 4+ years building infrastructure at scale, experience with TypeScript or Go, and distributed/traffic systems.

153k – 376kSan Francisco, CA +1DevOps / SRERemote4+ YOEGoAWS

Cloud Engineer - Product Metrics

Design, build, and operate petabyte-scale distributed systems for product metrics using Golang, Kubernetes, and ClickHouse. Requires 5+ years building scalable systems and 2+ years with Golang.

141k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Postgres Deployment Engineer

Own stability and deployment of PostgreSQL products. Package software with Nix, manage upgrades, optimize CI/CD, and resolve production issues. Requires 3+ years PostgreSQL experience and Nix proficiency.

United StatesDevOps / SRERemote3+ YOECGo