Skip to content

OpenAI DevOps / SRE Jobs

Open devops / sre roles at OpenAI, pulled live from their hiring system.

View devops / sre jobs across all companies

43 openOpenAIDevOps / SRE

DevOps / SRE roles at OpenAI roles cluster around $255k, with most listings between $230k and $293k. 56% of open devops / sre roles call out Kubernetes; Python and Distributed Systems appear in roughly a third. Most of these devops / sre roles are on-site or hybrid; 5% are fully remote.

Related roles
Latest devops / sre roles at OpenAI
OpenAI

Software Engineer, Full-Stack — Developer Experience

Build and operate scalable CI and Bazel-based build systems that accelerate engineering velocity and reliability for OpenAI's products and infrastructure.

185k – 490kSan Francisco, CA +2DevOps / SREOn-siteBazelKafka
OpenAI

Tech Lead, Deployment & Operations — Custom Infrastructure

Lead deployment and operations for OpenAI’s custom silicon and systems into data center environments. Drive hardware bring-up, validation, production deployment, and fleet reliability at scale while leading a technical team.

342k – 445kSan Francisco, CADevOps / SREHybridToolingAutomation
OpenAI

Datacenter NetDeploy Lead - Stargate

Leads end-to-end physical network deployments in data centers, overseeing vendor execution, fiber/cabling installation, testing, validation, and handover to operations. Requires 10+ years in data center network infrastructure delivery, strong cabling and topology knowledge, and cross-team coordination skills.

126k – 228kUnited StatesDevOps / SRERemoteSOWsBOMs
OpenAI

Software Engineer, Frontier Systems

Builds infrastructure to monitor, detect, remediate, and verify hardware health across global GPU/CPU clusters at hyperscale. Owns node lifecycle workflows and partners with teams to ensure compute reliability for AI training and inference. Requires 7+ years experience with Python, distributed systems, and operational tooling.

250k – 445kSan Francisco, CADevOps / SREOn-siteSQLGPU
OpenAI

Software Engineer, Productivity - Inference Runtime

Builds and improves CI/CD, testing, validation, and release tooling for OpenAI's inference runtime teams to ensure reliable, performant model deployments across ChatGPT, API, and research workloads. Requires strong Python skills, developer productivity experience, and high ownership in ambiguous environments.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++GPU
OpenAI

Software Engineer, Core Network Engineering

Builds and operates high-performance networking infrastructure for OpenAI's large-scale AI training and inference, focusing on host networking, datacenter fabrics, and WAN systems. Optimizes latency, reliability, and scalability using technologies like RDMA, InfiniBand, and RoCE; requires strong systems programming in C++, Python, or Go.

230k – 342kSan Francisco, CADevOps / SREOn-siteGoC++
OpenAI

Networking Operating System Firmware Engineer

Develops and maintains custom networking operating system firmware for AI supercomputers, integrating Linux kernel, switch ASICs, and control-plane services. Requires deep expertise in SONiC, SAI, routing protocols, and platform bring-up across hardware and software boundaries.

266k – 445kSan Francisco, CADevOps / SREHybridGoSAI
OpenAI

Performance & Systems Engineer, Codex

Optimizes performance across Codex AI system's stack including LLM inference, cloud orchestration, and agent behavior to reduce latency and costs. Collaborates with researchers and engineers on high-impact improvements in a high-ownership role.

295k – 445kSan Francisco, CADevOps / SREHybridKubernetesML systems
OpenAI

Software Engineer, Productivity - Model Performance

Builds and improves developer tools, CI/CD pipelines, and testing workflows to boost productivity for OpenAI's model performance engineering teams. Requires strong Python skills, experience with developer infrastructure, and ability to work in ambiguous environments.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++Rust
OpenAI

Software Engineer, Productivity - Networking

Enhances developer productivity for OpenAI's networking team by improving build systems, CI/CD pipelines, test harnesses, and workflows for C++ and Python codebases in multi-server environments. Requires experience with developer tools and infrastructure automation.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++CI/CD
OpenAI

Compute Optimization Researcher/Engineer

Develops optimization models, forecasting frameworks, and planning systems to maximize compute capacity utilization across GPU clusters, data centers, and cloud providers. Requires PhD and 5+ years in optimization or infrastructure planning with strong Python and solver expertise.

293k – 455kSan Francisco, CA +1DevOps / SREHybridSQLSpark
OpenAI

Tokens-as-a-Service (Taas) Software Engineer

Builds systems and tooling to measure, monitor, and optimize token throughput from GPU infrastructure for OpenAI workloads. Integrates partner compute environments, benchmarks performance, analyzes tokenomics, and develops operational metrics and dashboards. Requires strong distributed systems and infrastructure engineering experience.

293k – 455kSan Francisco, CA +1DevOps / SREHybridKubernetesDashboards
OpenAI

Software Engineer, Compute Infrastructure

Builds and optimizes large-scale compute infrastructure for AI workloads, spanning hardware automation, distributed systems, Kubernetes orchestration, networking, storage, and developer tools. Requires strong systems engineering experience in performance, reliability, and production infrastructure.

230k – 405kSan Francisco, CA +2DevOps / SREHybridNCCLRDMA
OpenAI

Systems Engineer (Network / Storage / Systems)

Systems Engineer architects, validates, and operationalizes networking, storage, and hardware infrastructure for large-scale AI compute environments. Requires 7+ years in systems engineering with expertise in hardware bring-up, debugging, and vendor management in fast-paced settings.

335k – 455kSan Francisco, CADevOps / SREHybridGoBash
OpenAI

CPU Storage Tech Lead

Leads technical strategy for CPU platforms, memory, and storage architectures in large-scale AI data centers. Evaluates vendor roadmaps, drives platform decisions, and ensures optimization for AI training and inference with 10+ years experience in server hardware and hyperscale infrastructure.

342k – 555kSan Francisco, CA +1DevOps / SREHybridx86ARM
OpenAI

CPU/Storage/PoP-WAN Program Manager

Leads execution of CPU, storage, PoP, and WAN infrastructure programs to activate compute clusters and expand global networks. Requires 8+ years in technical program management with deep knowledge of hardware, networking, and data center deployments.

342k – 555kSan Francisco, CA +1DevOps / SREHybridWANAzure
OpenAI

Data Center Controls Network Engineer

Designs, validates, and scales secure OT network architectures for high-density AI data centers, including controls systems, telemetry, and integration with IT infrastructure. Requires 8+ years in OT networking, industrial protocols, and resilient topologies in mission-critical environments.

257k – 327kSan Francisco, CADevOps / SREHybridPRPHSR
OpenAI

Workload Porting & Performance Engineer

Evaluates new hardware platforms by porting benchmarks and workloads, analyzes performance across compute/memory/networking, identifies bottlenecks, and optimizes for AI systems. Requires expertise in performance analysis, system architecture, and debugging across hardware/software boundaries.

342k – 555kSan Francisco, CA +1DevOps / SREHybridCPUGPU
OpenAI

3P Architect

Defines rack- and cluster-level reference architectures for AI infrastructure, translates workload requirements into designs, collaborates with partners and modeling teams to evaluate tradeoffs, and drives vendor roadmaps to address technology gaps.

342k – 555kSan Francisco, CA +1DevOps / SREHybridODMJDM
OpenAI

Performance Modeling Engineer ~2

Develop and maintain performance modeling tools to analyze AI system behavior, evaluate tradeoffs in compute, memory, networking, and storage. Requires 1-2 years experience in software engineering or systems analysis, strong programming, and analytical skills.

266k – 445kSan Francisco, CA +1DevOps / SREHybridPythonNetworking
OpenAI

Performance Modeling Engineer

Develops and maintains performance modeling tools and frameworks to evaluate AI system behavior, analyze tradeoffs in compute, memory, networking, and storage. Collaborates with architects on simulations and insights for infrastructure design; requires strong software/modeling background and system architecture knowledge.

266k – 445kSan Francisco, CA +1DevOps / SREHybridC++Python
OpenAI

Software Engineer, Engineering Acceleration | Consumer Devices

Builds and operates CI/CD systems, developer workflows, and internal platforms to accelerate engineering velocity for consumer device software across device and cloud. Requires 7+ years experience with deep CI/CD and platform expertise.

230k – 342kSan Francisco, CADevOps / SREHybridCI/CDBazel
OpenAI

Software Engineer, Kernel Performance & AI Tooling

Develops kernel performance optimizations, AI-assisted tooling, and observability infrastructure for AI-native hardware. Requires strong low-level systems experience, kernel/accelerator expertise, and familiarity with AI workflows for engineering acceleration.

266k – 445kSan Francisco, CADevOps / SREHybridGPUsCPUs
OpenAI

Software Engineer, Infrastructure, Consumer Devices

Designs and builds scalable cloud infrastructure platforms powering OpenAI's consumer products, focusing on Kubernetes orchestration, reliability, and growth. Requires 8+ years experience leading large-scale systems with strong systems thinking.

325k – 440kSan Francisco, CADevOps / SREHybridAWSGCP
OpenAI

ChatGPT Performance Engineer

Performance Engineer optimizes infrastructure and application performance for ChatGPT and OpenAI API, focusing on latency, throughput, and efficiency at scale. Requires 7+ years in high-scale systems with expertise in profiling, tracing, and cross-layer optimizations.

325k – 405kSan Francisco, CA +2DevOps / SRERemotePythonGolang