Skip to content

xAI DevOps / SRE Jobs

Open devops / sre roles at xAI, pulled live from their hiring system.

View devops / sre jobs across all companies

14 openxAIDevOps / SRE

Most common stacks in the current devops / sre listings: Linux, Python, Bash. Most of these devops / sre roles are on-site or hybrid; 7% are fully remote.

Related roles
Latest devops / sre roles at xAI
xAI

Member of Technical Staff

SRE role focused on automating reliability workflows, building observability, and ensuring uptime for multi-data center AI infrastructure. Requires 5+ years experience, strong Python skills, Linux expertise, and Kubernetes knowledge.

Memphis, TNDevOps / SREOn-siteRustLinux
xAI

Sr. Software Engineer

Sr. Software Engineer focused on automating reliability workflows, observability, and incident response across multi-data center AI infrastructure. Requires strong Python skills, Linux expertise, and 3+ years SRE/infrastructure experience.

Memphis, TNDevOps / SREOn-siteRustLinux
xAI

Water Treatment Engineer

Designs and optimizes water treatment systems for AI supercomputing facilities' cooling infrastructure, ensuring reliability, efficiency, and compliance. Requires bachelor's in engineering and 1+ years in industrial water treatment, with onsite work in Memphis region.

Memphis, TNDevOps / SREOn-siteEPAWUE
xAI

Member Of Technical Staff - Cloud Infrastructure

Designs, builds, and operates secure, scalable infrastructure including Kubernetes clusters and GPU hardware for large-scale AI workloads in classified US government environments. Requires 5+ years experience, Top Secret clearance, and expertise in IaC tools like Terraform and Ansible.

180k – 440kPalo Alto, CA +1DevOps / SREOn-siteGoGPU
xAI

BIM Manager

Leads BIM standards, model coordination, and document control for capital projects, providing hands-on Revit drafting for MEP systems. Ensures design quality and owner interests from design through construction handover. Requires 7+ years BIM/Revit experience with MEP focus.

Memphis, TNDevOps / SRERemoteBIM 360AutoCAD
xAI

Controls Engineer

Controls Engineer implements and optimizes data center control systems including BMS, EPMS, SCADA, PLCs, and HMIs to ensure stability and efficiency. Requires 5+ years experience, bachelor's in engineering, Ignition SCADA, and Siemens TIA Portal proficiency.

Memphis, TNDevOps / SREOn-siteBMSHMI
xAI

Senior IT Systems Engineer

Senior IT Systems Engineer leads design, implementation, and optimization of SaaS platforms like Okta and Google Workspace, advances IAM programs, drives automation, and troubleshoots complex issues in hybrid environments. Requires 8+ years experience, IAM expertise, and scripting proficiency.

184k – 276kPalo Alto, CADevOps / SREHybridn8nAWS
xAI

AI/HPC Network Development Engineer - Networking

Develops and optimizes high-performance ethernet networks for massive AI/HPC GPU clusters using RoCEv2 and NCCL. Requires 10+ years network experience, 5+ in AI/HPC ethernet, Python automation, with travel to data centers.

Palo Alto, CA +1DevOps / SREOn-siteNCCLRoCEv2
xAI

Network Development Engineer, ML Infrastructure (High-Speed Interconnects)

Designs, builds, and optimizes high-speed copper and optical interconnects for large-scale AI/ML clusters. Requires 8+ years experience in high-speed networking, deep knowledge of SerDes, photonics, and Master's/PhD in EE/Photonics/Physics.

180k – 440kPalo Alto, CADevOps / SREOn-siteFECTIA
xAI

Software Engineer, Compute Infra

Designs, builds, and operates massive-scale compute clusters and custom container orchestration platforms for AI training and inference at exascale. Requires deep expertise in virtualization, containerization, systems programming in C++/Rust, and Linux kernel internals.

180k – 440kPalo Alto, CADevOps / SREOn-siteKVMXen
xAI

Network Engineer

Network Engineer deploys, operates, and troubleshoots global backbone, datacenter, and corporate networks. Requires BGP, TCP/IP, routing protocols knowledge, vendor experience (Juniper, Cisco, Arista, Aruba), with strong documentation skills.

180k – 440kMemphis, TNDevOps / SREOn-siteBGPNAC
xAI

Sr. Datacenter Operations Technician

Maintains and troubleshoots server and network infrastructure in data centers, focusing on minimizing MTTD and MTTR. Handles racking, cabling, inventory, and on-call emergencies with 5+ years hardware experience required.

Memphis, TN +1DevOps / SREOn-siteBashJira
xAI

IT Systems Engineer

IT Systems Engineer builds, manages, and supports Windows/Linux infrastructure, VMware virtualization, and Puppet automation for corporate systems. Requires 3-5 years experience in systems engineering, troubleshooting, scripting, and on-call support in a fast-paced environment.

162k – 226kPalo Alto, CADevOps / SREOn-siteBashPerl
xAI

Datacenter Operations Technician

Maintains server and network infrastructure in data centers, focusing on troubleshooting, hardware installation, inventory management, and minimizing downtime (MTTD/MTTR). Requires 2+ years hardware experience, high school diploma, physical capability, and on-call availability.

Memphis, TNDevOps / SREOn-siteBashJira