Member of Technical Staff, Cloud Infrastructure

Builds and maintains scalable cloud infrastructure, focusing on reliability and performance. Requires expertise in cloud platforms, IaC tools like Terraform and Kubernetes, and systems programming.

175k – 220kNew York, NYSan Mateo, CADevOps / SREHybrid

Apply

About the role

Responsibilities

Design, build, and maintain cloud infrastructure systems.
Optimize scalability, reliability, and performance of cloud services.
Collaborate with engineering teams on infrastructure as code implementations.

Requirements

Expertise in cloud platforms like AWS, GCP, or Azure.
Proficiency in infrastructure tools such as Terraform, Kubernetes, and Docker.
Strong experience with Linux/Unix systems and networking.

Nice-to-Haves

Experience with CI/CD pipelines.
Knowledge of monitoring tools like Prometheus or Grafana.

Skills

KubernetesTerraformDockerAWSGCPLinuxCI/CDPrometheusGrafanaInfrastructure As Code

Similar roles

DevOps / SRE jobs

Sage

Senior/Staff Site Reliability Engineer

Leads design, operation, and evolution of highly reliable, scalable production infrastructure including cloud, databases, and observability. Drives incident response, SRE practices, automation, and capacity planning for large-scale distributed systems. Requires 7-12+ years in SRE/infrastructure engineering.

175k – 230kNew York, NYDevOps / SREHybrid7+ YOEGoAWS

Fireworks AI

Member of Technical Staff, AI Training Infrastructure

Builds and optimizes scalable infrastructure for AI model training on large GPU clusters. Requires expertise in distributed systems, Python/C++, and ML frameworks.

175k – 220kSan Mateo, CADevOps / SREOn-siteC++AWS

Fireworks AI

Member of Technical Staff, Performance Optimization

Optimizes performance of high-scale systems by analyzing latency, throughput, and resource usage. Requires expertise in profiling, systems programming, and distributed scaling techniques.

175k – 220kSan Mateo, CADevOps / SREOn-siteGoC++

Rad AI

Staff Software Engineer, Infrastructure

Designs and operates scalable cloud infrastructure on AWS, focusing on Kubernetes orchestration, reliability practices, and observability for AI healthcare products. Requires 8+ years experience with IaC, containerization, and cross-team leadership.

175k – 230kUnited StatesDevOps / SRERemote8+ YOEAWSGCP

Grafana Labs

Staff Software Engineer - Platform, SysEng

Staff Backend Engineer on the Platform SysEng squad building and scaling the internal engineering platform that powers Grafana Cloud services. Owns distributed systems design, Kubernetes infrastructure, reliability/SLOs, and performance at massive scale.

175k – 210kUnited StatesDevOps / SRERemote7+ YOEGoIac