xAI DevOps / SRE Jobs
Open devops / sre roles at xAI, pulled live from their hiring system.
View devops / sre jobs across all companies
Most common stacks in the current devops / sre listings: Linux, Python, Bash. Most of these devops / sre roles are on-site or hybrid; 7% are fully remote.
Member of Technical Staff
SRE role focused on automating reliability workflows, building observability, and ensuring uptime for multi-data center AI infrastructure. Requires 5+ years experience, strong Python skills, Linux expertise, and Kubernetes knowledge.
Sr. Software Engineer
Sr. Software Engineer focused on automating reliability workflows, observability, and incident response across multi-data center AI infrastructure. Requires strong Python skills, Linux expertise, and 3+ years SRE/infrastructure experience.
Water Treatment Engineer
Designs and optimizes water treatment systems for AI supercomputing facilities' cooling infrastructure, ensuring reliability, efficiency, and compliance. Requires bachelor's in engineering and 1+ years in industrial water treatment, with onsite work in Memphis region.
Member Of Technical Staff - Cloud Infrastructure
Designs, builds, and operates secure, scalable infrastructure including Kubernetes clusters and GPU hardware for large-scale AI workloads in classified US government environments. Requires 5+ years experience, Top Secret clearance, and expertise in IaC tools like Terraform and Ansible.
BIM Manager
Leads BIM standards, model coordination, and document control for capital projects, providing hands-on Revit drafting for MEP systems. Ensures design quality and owner interests from design through construction handover. Requires 7+ years BIM/Revit experience with MEP focus.
Controls Engineer
Controls Engineer implements and optimizes data center control systems including BMS, EPMS, SCADA, PLCs, and HMIs to ensure stability and efficiency. Requires 5+ years experience, bachelor's in engineering, Ignition SCADA, and Siemens TIA Portal proficiency.
Senior IT Systems Engineer
Senior IT Systems Engineer leads design, implementation, and optimization of SaaS platforms like Okta and Google Workspace, advances IAM programs, drives automation, and troubleshoots complex issues in hybrid environments. Requires 8+ years experience, IAM expertise, and scripting proficiency.
AI/HPC Network Development Engineer - Networking
Develops and optimizes high-performance ethernet networks for massive AI/HPC GPU clusters using RoCEv2 and NCCL. Requires 10+ years network experience, 5+ in AI/HPC ethernet, Python automation, with travel to data centers.
Network Development Engineer, ML Infrastructure (High-Speed Interconnects)
Designs, builds, and optimizes high-speed copper and optical interconnects for large-scale AI/ML clusters. Requires 8+ years experience in high-speed networking, deep knowledge of SerDes, photonics, and Master's/PhD in EE/Photonics/Physics.
Software Engineer, Compute Infra
Designs, builds, and operates massive-scale compute clusters and custom container orchestration platforms for AI training and inference at exascale. Requires deep expertise in virtualization, containerization, systems programming in C++/Rust, and Linux kernel internals.
Network Engineer
Network Engineer deploys, operates, and troubleshoots global backbone, datacenter, and corporate networks. Requires BGP, TCP/IP, routing protocols knowledge, vendor experience (Juniper, Cisco, Arista, Aruba), with strong documentation skills.
Sr. Datacenter Operations Technician
Maintains and troubleshoots server and network infrastructure in data centers, focusing on minimizing MTTD and MTTR. Handles racking, cabling, inventory, and on-call emergencies with 5+ years hardware experience required.
IT Systems Engineer
IT Systems Engineer builds, manages, and supports Windows/Linux infrastructure, VMware virtualization, and Puppet automation for corporate systems. Requires 3-5 years experience in systems engineering, troubleshooting, scripting, and on-call support in a fast-paced environment.
Datacenter Operations Technician
Maintains server and network infrastructure in data centers, focusing on troubleshooting, hardware installation, inventory management, and minimizing downtime (MTTD/MTTR). Requires 2+ years hardware experience, high school diploma, physical capability, and on-call availability.