Lead DevOps Engineer

Lead DevOps Engineer owning multi-cloud SaaS infrastructure, CI/CD automation, observability, and mentoring engineers. Requires 5-7+ years experience with Kubernetes, Terraform, cloud platforms, and production environments.

Somerville, MADevOps / SREHybrid5+ YOE

Apply

About the role

Key Responsibilities

Own the deployment, health, and continuous improvement of Tulip's multi-cloud, multi-region SaaS environments — including clusters spanning the US, Europe, and Asia
Design and evolve cloud architecture to ensure customer availability, stability, and performance as Tulip scales globally
Contribute to and help shape the infrastructure technical roadmap in partnership with engineering leadership
Own and continuously improve Tulip's CI/CD infrastructure, driving toward a fully automated, human-interaction-free software delivery lifecycle
Build automation tooling and internal systems that reduce operational toil and increase developer velocity
Define and maintain observability standards across Tulip's cloud environments, including metrics, alerting, logging, and distributed tracing
Proactively identify performance degradation and capacity risks before they impact customers; lead incident response and drive root cause analysis
Mentor and coach junior and mid-level engineers through code reviews, pairing sessions, and regular technical guidance
Serve as a close partner to application engineering teams throughout the SDLC, providing infrastructure guidance and support
Participate in the on-call rotation and help establish on-call best practices that scale as the team grows

Requirements

5-7+ years of hands-on DevOps or Infrastructure Engineering experience, with demonstrated ownership of production cloud environments at scale
Proficiency with modern cloud infrastructure tooling — experience with Kubernetes, Helm, Terraform, Ansible, and major cloud providers (AWS and/or Azure)
Proven experience mentoring and coaching engineers — whether formally or informally
Experience managing enterprise-grade data persistence layers, including NoSQL and SQL databases, key/value stores, and messaging systems (e.g., AMQP, MQTT)
Familiarity with observability and monitoring tooling (e.g., Prometheus, Mimir, Thanos, Grafana) and a strong understanding of what good SRE practice looks like in a fast-growing SaaS environment
Comfort driving team rituals — sprint planning, standups, retrospectives — and contributing to a high-performing team culture
Exposure to modern programming or scripting languages used in infrastructure contexts (e.g., Go, TypeScript, Python, Bash)
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

Nice-to-Haves

Experience with multi-cloud, multi-region SaaS environments
Experience with CI/CD infrastructure automation

Benefits

Company equity
Competitive benefits package including Health, Dental, Vision, Short-term Disability, Long-term Disability, Life Insurance, AD&D Insurance, Flexible Spending Account (FSA), Commuter Benefits, Parental Leave, and 401(K)
Flexible work schedule and unlimited vacation policy
Virtual company events and happy hours
Fitness subsidies

Skills

KubernetesHelmTerraformAnsibleAWSAzurePrometheusGrafanaPythonGo

Similar roles

DevOps / SRE jobs

Cursor

Software Engineer, Services Platform

Build platform primitives for service provisioning, deploy tooling, workflow orchestration, and service ownership at a fast-scaling AI coding tool company. Requires experience with durable workflows like Temporal, internal dev platforms, and strong focus on developer experience and reliability.

San Francisco, CA +1DevOps / SREOn-site5+ YOECI/CDTemporal

Beacon AI

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud and LLM infrastructure powering RAG, inference, and data pipelines for an aviation AI platform. Requires strong AWS depth, Python data pipelines, and production LLM experience.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOEAWSVpc

Figma

Software Engineer, Traffic

Design, build, and operate scalable distributed systems and edge networks on AWS to handle Figma's growing customer traffic and services. Requires 4+ years building infrastructure at scale, experience with TypeScript or Go, and distributed/traffic systems.

153k – 376kSan Francisco, CA +1DevOps / SRERemote4+ YOEGoAWS

Clickhouse

Cloud Engineer - Product Metrics

Design, build, and operate petabyte-scale distributed systems for product metrics using Golang, Kubernetes, and ClickHouse. Requires 5+ years building scalable systems and 2+ years with Golang.

141k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Supabase

Postgres Deployment Engineer

Own stability and deployment of PostgreSQL products. Package software with Nix, manage upgrades, optimize CI/CD, and resolve production issues. Requires 3+ years PostgreSQL experience and Nix proficiency.

United StatesDevOps / SRERemote3+ YOECGo