Senior Software Engineer, Cloud Platform

Build and operate the cloud platform powering Zilliz Cloud and Vector Lakebase across multi-cloud environments, integrating control plane, scheduling, and database runtime for scalable AI workloads. Requires 3+ years building production systems, strong Kubernetes and cloud experience, and a bachelor's degree or equivalent.

175k – 225kRedwood City, CADevOps / SREHybrid3+ YOE

Apply

About the role

What you'll do

Design and build the cloud platform behind Zilliz Cloud and Vector Lakebase, bringing together cloud control plane, database runtime, scheduling, resource management, deployment, and lifecycle management to support fast workload placement, elastic scaling, multi-tenant isolation, and cost-efficient execution
Build cloud-native systems that make distributed database provisioning, scaling, upgrades, recovery, and workload migration automated, observable, rollback-safe, and efficient
Work deep across Kubernetes, multi-cloud infrastructure, networking, storage, and database engine runtimes to deliver a tightly integrated cloud-and-engine product experience
Improve platform scalability, reliability, performance, and operational simplicity as we grow across customers, regions, tenants, datasets, and AI workloads
Partner with database, reliability, and product engineers to bring new Vector Lakebase capabilities into cloud production safely and quickly
Use AI deeply across the platform engineering workflow, including deployment validation, diagnosis, incident analysis, capacity planning, documentation, code generation, and operational tooling

What we're looking for

3+ years of experience building production systems such as large-scale SaaS platforms, data platforms, AI applications, microservices, or cloud infrastructure
Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience
Strong hands-on experience with Kubernetes, Docker, and at least one major cloud platform such as AWS, GCP, or Azure
Familiarity with infrastructure automation and cloud operations tooling such as Terraform, Helm, Argo CD, Prometheus, Grafana, CI/CD systems, or similar tools
Experience building cloud-native platform systems is a strong plus, including scheduling, orchestration, deployment, configuration, upgrades, lifecycle management, or resource management
Understanding of distributed databases or database engine internals is a strong plus, especially around scalability, performance, reliability, and multi-tenant isolation
Strong interest in AI-assisted development and engineering productivity. We value engineers who actively use AI to multiply their output across coding, debugging, testing, documentation, and operations

How we operate

High ownership: You own platform outcomes end-to-end, from design to production behavior, not just a narrow slice of the system
AI-first engineering: We actively use AI to improve coding, testing, documentation, diagnosis, and operations, but human engineering taste still matters most
Fast and focused: We ship often while keeping a high bar. This team suits engineers who want speed, autonomy, and a steep growth curve
Global collaboration: We work closely with engineering teams across APAC and the US, designing collaboration around timezone coverage to support customers globally

Benefits

Competitive compensation (cash + equity)
Regular bonus and equity refresh opportunities
Medical, dental, and vision insurance
Paid time off, including vacation, sick leave, and global reset/wellbeing days
Generous 401(k) and regional retirement plans

Skills

KubernetesDockerAWSGCPAzureTerraformHelmArgo CdPrometheusGrafanaCI/CD

Similar roles

DevOps / SRE jobs

Cribl

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3

Zilliz

Senior Site Reliability Engineer Cloud Platform

Senior SRE focuses on ensuring reliability, availability, and performance of distributed database systems in cloud-native environments. Requires 4+ years experience with Kubernetes, Docker, cloud platforms (AWS/GCP/Azure), IaC tools, and scripting in Python/Go/Java.

175k – 225kRedwood City, CADevOps / SREHybrid4+ YOEGoAWS

Fivetran

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP

Okta

Senior Site Reliability Engineer, TCore (FedRamp)

Staff SRE on the TCore team responsible for designing and operating Okta's global network infrastructure, ensuring high availability, performance, and security of cloud edge and internal networks. Requires 8+ years in cloud networking with deep AWS/GCP expertise and automation skills.

174k – 267kSan Francisco, CADevOps / SREHybrid8+ YOEGoAWS

Temporal

Senior Software Engineer, DevProd (Infrastructure Observability)

Leads end-to-end development of scalable distributed systems for infrastructure observability, owns production issues, and collaborates on designs. Requires expertise in Go, Kubernetes, SQL, cloud providers, and observability tools like Clickhouse and Prometheus.

176k – 238kUnited StatesDevOps / SRERemoteGoSQL