OS / K8s Systems Engineer
165k – 330kSan Francisco, CANew York, NYDevOps / SREHybrid
Summary
Build automation and systems to provision and orchestrate GPU hardware into scalable Kubernetes clusters. Requires deep Linux expertise, provisioning experience, and strong programming in Python/Go.
About the role
Responsibilities
- Own the end-to-end automation of cluster bring-up and lifecycle management.
- Build and maintain OS images, provisioning systems, and configuration pipelines.
- Deploy and operate cluster orchestration platforms (Kubernetes, Slurm, or similar).
- Design systems for reproducibility across sites and hardware generations.
- Automate upgrades, rollouts, and failure recovery.
- Optimize system performance, including GPU utilization and networking.
- Partner with hardware and network teams to validate and improve system behavior.
Requirements
- Experience building and operating automated infrastructure systems.
- Strong programming skills (Python, Go, or similar).
- Deep familiarity with Linux systems, including boot processes, drivers, and performance.
- Experience with provisioning systems (PXE, imaging, configuration management).
- Experience with Kubernetes.
- Strong debugging skills across system layers (hardware → OS → network).
- Experience working with GPU or high-performance workloads is a plus.
Skills
KubernetesLinuxPythonGoPXEGPUProvisioningConfiguration ManagementSlurmDebugging
Similar roles at this salary range
All DevOps / SRE jobs →Senior Network Engineer
Design, deploy, and operate enterprise network infrastructure for corporate facilities and hybrid cloud environments with zero-trust architecture and compliance requirements. Requires 5+ years enterprise networking experience and ability to obtain TS/SCI clearance.
133k – 215kLos Angeles, CA +1DevOps / SREOn-site5+ YOEAWSVLAN
Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE