Staff Software Engineer, Infrastructure
Hands-on Infrastructure Tech Lead building and scaling AWS cloud infrastructure from scratch for an AI-driven enterprise analytics platform. Owns architecture, IaC, security/compliance (SOC 2), and operational excellence.
What you'll do
- Hands-on platform building. Architect and implement foundational cloud infrastructure from scratch: compute, networking, CI/CD, and observability. Actively writing infrastructure-as-code and shipping production systems.
- Own the infrastructure architecture. Define and execute the technical vision and multi-year roadmap for our AWS-based platform (ECS/Fargate, containerized Python and Node services).
- Run durable, AI-heavy workloads. Operate and scale our workflow orchestration layer (Temporal), streaming pipelines, vector search infrastructure, and high-throughput LLM inference paths.
- Design for security and compliance. Build infrastructure that meets SOC 2 requirements from day one: multi-tenant isolation, secrets management, least-privilege IAM, audit logging, and encrypted data flows.
- Establish operational excellence. Set and uphold standards for IaC, deployment pipelines, incident response, SLOs, and on-call practices; mentor engineers.
- Cross-functional collaboration. Partner with product, backend, frontend, and enterprise customers to translate requirements into pragmatic infrastructure solutions.
What you bring
- 7+ years building and scaling mission-critical cloud infrastructure on AWS and/or GCP, with demonstrated experience architecting platforms from the ground up.
- Production experience with ECS/Fargate, Kubernetes, or equivalent: including service networking, autoscaling, zero-downtime deploys, and multi-environment release strategies.
- Strong command of Terraform, Pulumi, or CloudFormation, plus CI/CD pipeline design (GitHub Actions or similar) and GitOps workflows.
- Familiarity operating Postgres at scale (RDS, Supabase, or self-managed), Redis, message/workflow systems (Temporal, SQS, Kafka), and ideally vector databases or LLM serving infrastructure.
- Practical experience with SOC 2 (or similar) compliance programs, IAM design, VPC architecture, secrets management, and multi-tenant data isolation.
- Execution-driven mindset with end-to-end ownership of systems.
Senior Network & Site Reliability Engineer
Design, operate, and automate the global network and reliability layer for a high-performance NVIDIA DGX SuperPOD supporting ML workloads. Own architecture, observability, incident response, and security for mission-critical infrastructure.
Senior Software Engineer - Observability Visibility
Senior engineer building observability and resilience standards, tooling, and automation to make reliability the default across Datadog services. Requires 5+ years experience, Go/Python skills, and AI feature delivery experience.
Senior Manager, DevOps Engineering
Lead and mentor a team of DevOps and Infrastructure Engineers responsible for build pipelines, CI/CD systems, developer tooling, and release infrastructure across Hivemind Solutions. Drive modernization of C++/Python build ecosystems and ensure scalable, secure software delivery pipelines.
Staff Software Engineer
Staff Software Engineer owning technical strategy and systems for Coinbase's test infrastructure at scale. Focus on fast, reliable test signals through orchestration, smart selection, sharding, and flakiness remediation.
Staff Engineer, AI Productivity
Staff-level engineer building infrastructure, tooling, and documentation to make AI coding agents dramatically more productive across the codebase. Owns agentic dev environments, MCP integrations, and agent context.