Skip to content

Staff Infrastructure Software Engineer, Enterprise AI

Builds and scales multi-cloud infrastructure for enterprise AI Agentic workflows, focusing on security, compliance, observability, and developer tools. Requires 5+ years experience with modern infra practices, cloud providers, and languages like Python.

216k – 270kNew York, NYSan Francisco, CADevOps / SREHybrid5+ YOE

About the role

What You'll Do:

  • Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers.
  • Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies.
  • Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response.
  • Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization.
  • Solve the toughest engineering problems related to multi-tenancy, data isolation, and high-performance inference at a massive scale, taking end-to-end ownership across the full product lifecycle.

What We're Looking For:

  • Proven experience in a senior role, with 5+ years of full-time software engineering experience.
  • Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana).
  • Extensive experience with at least one major cloud provider (AWS, Azure, or GCP).
  • Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups.
  • Proficiency in Python or JavaScript/TypeScript, and SQL.
  • Bonus points: Hands-on experience and a passion for working with Agents, LLMs, vector databases, and other emerging AI technologies.

Skills

KubernetesTerraformHelmDatadogPrometheusGrafanaAWSAzureGCPPython

Similar roles

DevOps / SRE jobs

Staff Software Engineer, Network Automation

Design and deliver automation frameworks, observability platforms, and self-healing workflows for Crusoe's global network fleet. Requires 8+ years network engineering experience with strong Python/Go skills and expertise in model-driven automation.

215k – 260kSan Francisco, CA +2DevOps / SREOn-site8+ YOEGoBGP

Staff Software Engineer, Platform Infrastructure

Staff engineer builds and owns scalable multi-cloud platform infrastructure for Astronomer's DataOps products. Requires deep Kubernetes/Go expertise, distributed systems knowledge, and multi-cloud experience to ensure reliability at enterprise scale.

215k – 250kNew York, NY +1DevOps / SREHybridGoAWS

Staff Site Reliability Engineer

Leads infrastructure transformation from monoliths to scalable microservices at massive scale, architects observability/CI/CD systems, unifies complex stacks, and mentors engineers. Requires 10+ years coding internal tools, 5+ years cloud (GCP/AWS), Bachelor's in CS.

218k – 260kMountain View, CADevOps / SREOn-site10+ YOEGCPAWS

Staff Software Engineer, Core Reliability

Staff engineer on the Infra Reliability team improving system resiliency, deployment safety, and configuration management for Coinbase's production environment at massive scale.

218k – 257kUnited StatesDevOps / SRERemote7+ YOEGoAWS

Staff Software Engineer

Staff Software Engineer owning technical strategy and systems for Coinbase's test infrastructure at scale. Focus on fast, reliable test signals through orchestration, smart selection, sharding, and flakiness remediation.

218k – 257kUnited StatesDevOps / SRERemote10+ YOEGoAWS