Skip to content

Senior Software Engineer, Cloud Platform

Build and operate the cloud platform powering Zilliz Cloud and Vector Lakebase across multi-cloud environments, integrating control plane, scheduling, and database runtime for scalable AI workloads. Requires 3+ years building production systems, strong Kubernetes and cloud experience, and a bachelor's degree or equivalent.

175k – 225kRedwood City, CADevOps / SREHybrid3+ YOE

About the role

What you'll do

  • Design and build the cloud platform behind Zilliz Cloud and Vector Lakebase, bringing together cloud control plane, database runtime, scheduling, resource management, deployment, and lifecycle management to support fast workload placement, elastic scaling, multi-tenant isolation, and cost-efficient execution
  • Build cloud-native systems that make distributed database provisioning, scaling, upgrades, recovery, and workload migration automated, observable, rollback-safe, and efficient
  • Work deep across Kubernetes, multi-cloud infrastructure, networking, storage, and database engine runtimes to deliver a tightly integrated cloud-and-engine product experience
  • Improve platform scalability, reliability, performance, and operational simplicity as we grow across customers, regions, tenants, datasets, and AI workloads
  • Partner with database, reliability, and product engineers to bring new Vector Lakebase capabilities into cloud production safely and quickly
  • Use AI deeply across the platform engineering workflow, including deployment validation, diagnosis, incident analysis, capacity planning, documentation, code generation, and operational tooling

What we're looking for

  • 3+ years of experience building production systems such as large-scale SaaS platforms, data platforms, AI applications, microservices, or cloud infrastructure
  • Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience
  • Strong hands-on experience with Kubernetes, Docker, and at least one major cloud platform such as AWS, GCP, or Azure
  • Familiarity with infrastructure automation and cloud operations tooling such as Terraform, Helm, Argo CD, Prometheus, Grafana, CI/CD systems, or similar tools
  • Experience building cloud-native platform systems is a strong plus, including scheduling, orchestration, deployment, configuration, upgrades, lifecycle management, or resource management
  • Understanding of distributed databases or database engine internals is a strong plus, especially around scalability, performance, reliability, and multi-tenant isolation
  • Strong interest in AI-assisted development and engineering productivity. We value engineers who actively use AI to multiply their output across coding, debugging, testing, documentation, and operations

How we operate

  • High ownership: You own platform outcomes end-to-end, from design to production behavior, not just a narrow slice of the system
  • AI-first engineering: We actively use AI to improve coding, testing, documentation, diagnosis, and operations, but human engineering taste still matters most
  • Fast and focused: We ship often while keeping a high bar. This team suits engineers who want speed, autonomy, and a steep growth curve
  • Global collaboration: We work closely with engineering teams across APAC and the US, designing collaboration around timezone coverage to support customers globally

Benefits

  • Competitive compensation (cash + equity)
  • Regular bonus and equity refresh opportunities
  • Medical, dental, and vision insurance
  • Paid time off, including vacation, sick leave, and global reset/wellbeing days
  • Generous 401(k) and regional retirement plans

Skills

KubernetesDockerAWSGCPAzureTerraformHelmArgo CdPrometheusGrafanaCI/CD

Similar roles

DevOps / SRE jobs

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3

Senior Site Reliability Engineer Cloud Platform

Senior SRE focuses on ensuring reliability, availability, and performance of distributed database systems in cloud-native environments. Requires 4+ years experience with Kubernetes, Docker, cloud platforms (AWS/GCP/Azure), IaC tools, and scripting in Python/Go/Java.

175k – 225kRedwood City, CADevOps / SREHybrid4+ YOEGoAWS

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP

Senior Site Reliability Engineer, TCore (FedRamp)

Staff SRE on the TCore team responsible for designing and operating Okta's global network infrastructure, ensuring high availability, performance, and security of cloud edge and internal networks. Requires 8+ years in cloud networking with deep AWS/GCP expertise and automation skills.

174k – 267kSan Francisco, CADevOps / SREHybrid8+ YOEGoAWS

Senior Software Engineer, DevProd (Infrastructure Observability)

Leads end-to-end development of scalable distributed systems for infrastructure observability, owns production issues, and collaborates on designs. Requires expertise in Go, Kubernetes, SQL, cloud providers, and observability tools like Clickhouse and Prometheus.

176k – 238kUnited StatesDevOps / SRERemoteGoSQL