Skip to content

Site Reliability Engineer

Owns production infrastructure for clinical AI platform, ensuring 99.9%+ stability. Designs/scales Kubernetes-based systems, optimizes TypeScript/Python/ML CI/CD pipelines, and manages Terraform IaC in high-velocity environment.

200k – 275kSan Francisco, CADevOps / SREOnsite

About the role

Responsibilities

  • Own the entire production environment and improve the development experience.
  • Design, implement, and maintain the production environment, having previously handled 500+ machine deployments.
  • Own containerized infrastructure, leveraging deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health.
  • Optimize and streamline both TypeScript and Python/ML deployment pipelines to support high-velocity feature release while maintaining highest reliability.
  • Support Developer Experience (DevX) work to streamline developer workflows, enhance tool proficiency, and improve CI/CD systems.
  • Manage and maintain infrastructure definitions using Terraform.

Requirements

  • Deep, demonstrable experience with Kubernetes, Helm, and Terraform.
  • Proven ability to architect and maintain complex, distributed systems with high-availability requirements.
  • Hands-on experience optimizing deployment pipelines for both application code (TypeScript) and machine learning models (Python/ML).
  • Experience with PostgreSQL, Redis, Kafka.
  • Excitement about working five days per week in San Francisco office.
  • Intensity and technical mastery to own mission-critical infrastructure; thrive on owning complex systems, scaling deployments, automating, and problem-solving.

Skills

KubernetesHelmTerraformTypeScriptPythonCI/CDPostgresRedisKafkaInfrastructure As Code

Similar roles

DevOps / SRE jobs

Platform Engineer, Model Shaping

Build and operate backend services and infrastructure for model customization and evaluation at Together AI. Requires 3+ years building production infrastructure, strong Python/Go skills, and deep experience with Kubernetes, Linux, and cloud platforms.

200k – 290kSan Francisco, CADevOps / SREHybrid3+ YOEGoAWS

Platform Engineer

Own AWS infrastructure, Pulumi IaC, deployment pipelines, and security baseline for an AI research platform serving financial institutions. First dedicated platform hire defining enterprise deployment, SOC 2 controls, and developer experience.

200k – 280kNew York, NYDevOps / SREOn-site5+ YOEAWSCdk

SRE/Infrastructure Engineer

Own Terraform, Kubernetes, and cloud infrastructure for a fast-growing AI infrastructure startup. Manage multi-cloud deployments, build reusable infrastructure components, and support enterprise BYOC offerings.

200k – 350kSan Francisco, CADevOps / SREOn-site5+ YOEGCPAWS

Software Engineer - Infrastructure

Builds and scales reliable cloud infrastructure, deployment systems, observability, and developer tooling to support mortgage market operations. Requires experience with strongly typed languages, PostgreSQL, Kubernetes, and major cloud providers.

200k – 250kUnited StatesDevOps / SRERemoteGoC#

AI Automation Engineer

Builds AI-powered CI/CD pipelines and automation infrastructure to enable autonomous code generation, testing, and deployment. Collaborates across teams to identify AI opportunities and develops productivity tools, ensuring production reliability.

200k – 230kNew York, NYDevOps / SREHybridAICI/CD