OpenAI DevOps / SRE Jobs

Open devops / sre roles at OpenAI, pulled live from their hiring system.

View devops / sre jobs across all companies

43 openOpenAIDevOps / SRE

DevOps / SRE roles at OpenAI roles cluster around $255k, with most listings between $230k and $293k. 56% of open devops / sre roles call out Kubernetes; Python and Distributed Systems appear in roughly a third. Most of these devops / sre roles are on-site or hybrid; 5% are fully remote.

Related roles

Fullstack Engineering

Latest devops / sre roles at OpenAI

OpenAI

May 25

Software Engineer, Full-Stack — Developer Experience

Build and operate scalable CI and Bazel-based build systems that accelerate engineering velocity and reliability for OpenAI's products and infrastructure.

185k – 490kSan Francisco, CA +2DevOps / SREOn-siteBazelKafka

OpenAI

May 16

Tech Lead, Deployment & Operations — Custom Infrastructure

Lead deployment and operations for OpenAI’s custom silicon and systems into data center environments. Drive hardware bring-up, validation, production deployment, and fleet reliability at scale while leading a technical team.

342k – 445kSan Francisco, CADevOps / SREHybridToolingAutomation

OpenAI

May 14

Datacenter NetDeploy Lead - Stargate

Leads end-to-end physical network deployments in data centers, overseeing vendor execution, fiber/cabling installation, testing, validation, and handover to operations. Requires 10+ years in data center network infrastructure delivery, strong cabling and topology knowledge, and cross-team coordination skills.

126k – 228kUnited StatesDevOps / SRERemoteSOWsBOMs

OpenAI

May 11

Software Engineer, Frontier Systems

Builds infrastructure to monitor, detect, remediate, and verify hardware health across global GPU/CPU clusters at hyperscale. Owns node lifecycle workflows and partners with teams to ensure compute reliability for AI training and inference. Requires 7+ years experience with Python, distributed systems, and operational tooling.

250k – 445kSan Francisco, CADevOps / SREOn-siteSQLGPU

OpenAI

May 10

Software Engineer, Productivity - Inference Runtime

Builds and improves CI/CD, testing, validation, and release tooling for OpenAI's inference runtime teams to ensure reliable, performant model deployments across ChatGPT, API, and research workloads. Requires strong Python skills, developer productivity experience, and high ownership in ambiguous environments.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++GPU

OpenAI

May 6

Software Engineer, Core Network Engineering

Builds and operates high-performance networking infrastructure for OpenAI's large-scale AI training and inference, focusing on host networking, datacenter fabrics, and WAN systems. Optimizes latency, reliability, and scalability using technologies like RDMA, InfiniBand, and RoCE; requires strong systems programming in C++, Python, or Go.

230k – 342kSan Francisco, CADevOps / SREOn-siteGoC++

OpenAI

May 6

Networking Operating System Firmware Engineer

Develops and maintains custom networking operating system firmware for AI supercomputers, integrating Linux kernel, switch ASICs, and control-plane services. Requires deep expertise in SONiC, SAI, routing protocols, and platform bring-up across hardware and software boundaries.

266k – 445kSan Francisco, CADevOps / SREHybridGoSAI

OpenAI

May 1

Performance & Systems Engineer, Codex

Optimizes performance across Codex AI system's stack including LLM inference, cloud orchestration, and agent behavior to reduce latency and costs. Collaborates with researchers and engineers on high-impact improvements in a high-ownership role.

295k – 445kSan Francisco, CADevOps / SREHybridKubernetesML systems

OpenAI

Apr 29

Software Engineer, Productivity - Model Performance

Builds and improves developer tools, CI/CD pipelines, and testing workflows to boost productivity for OpenAI's model performance engineering teams. Requires strong Python skills, experience with developer infrastructure, and ability to work in ambiguous environments.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++Rust

OpenAI

Apr 28

Software Engineer, Productivity - Networking

Enhances developer productivity for OpenAI's networking team by improving build systems, CI/CD pipelines, test harnesses, and workflows for C++ and Python codebases in multi-server environments. Requires experience with developer tools and infrastructure automation.

230k – 385kSan Francisco, CADevOps / SREOn-siteC++CI/CD

OpenAI

Apr 27

Compute Optimization Researcher/Engineer

Develops optimization models, forecasting frameworks, and planning systems to maximize compute capacity utilization across GPU clusters, data centers, and cloud providers. Requires PhD and 5+ years in optimization or infrastructure planning with strong Python and solver expertise.

293k – 455kSan Francisco, CA +1DevOps / SREHybridSQLSpark

OpenAI

Apr 27

Tokens-as-a-Service (Taas) Software Engineer

Builds systems and tooling to measure, monitor, and optimize token throughput from GPU infrastructure for OpenAI workloads. Integrates partner compute environments, benchmarks performance, analyzes tokenomics, and develops operational metrics and dashboards. Requires strong distributed systems and infrastructure engineering experience.

293k – 455kSan Francisco, CA +1DevOps / SREHybridKubernetesDashboards

OpenAI

Apr 27

Software Engineer, Compute Infrastructure

Builds and optimizes large-scale compute infrastructure for AI workloads, spanning hardware automation, distributed systems, Kubernetes orchestration, networking, storage, and developer tools. Requires strong systems engineering experience in performance, reliability, and production infrastructure.

230k – 405kSan Francisco, CA +2DevOps / SREHybridNCCLRDMA

OpenAI

Apr 24

Systems Engineer (Network / Storage / Systems)

Systems Engineer architects, validates, and operationalizes networking, storage, and hardware infrastructure for large-scale AI compute environments. Requires 7+ years in systems engineering with expertise in hardware bring-up, debugging, and vendor management in fast-paced settings.

335k – 455kSan Francisco, CADevOps / SREHybridGoBash

OpenAI

Apr 21

CPU Storage Tech Lead

Leads technical strategy for CPU platforms, memory, and storage architectures in large-scale AI data centers. Evaluates vendor roadmaps, drives platform decisions, and ensures optimization for AI training and inference with 10+ years experience in server hardware and hyperscale infrastructure.

342k – 555kSan Francisco, CA +1DevOps / SREHybridx86ARM

OpenAI

Apr 21

CPU/Storage/PoP-WAN Program Manager

Leads execution of CPU, storage, PoP, and WAN infrastructure programs to activate compute clusters and expand global networks. Requires 8+ years in technical program management with deep knowledge of hardware, networking, and data center deployments.

342k – 555kSan Francisco, CA +1DevOps / SREHybridWANAzure

OpenAI

Apr 20

Data Center Controls Network Engineer

Designs, validates, and scales secure OT network architectures for high-density AI data centers, including controls systems, telemetry, and integration with IT infrastructure. Requires 8+ years in OT networking, industrial protocols, and resilient topologies in mission-critical environments.

257k – 327kSan Francisco, CADevOps / SREHybridPRPHSR

OpenAI

Apr 20

Workload Porting & Performance Engineer

Evaluates new hardware platforms by porting benchmarks and workloads, analyzes performance across compute/memory/networking, identifies bottlenecks, and optimizes for AI systems. Requires expertise in performance analysis, system architecture, and debugging across hardware/software boundaries.

342k – 555kSan Francisco, CA +1DevOps / SREHybridCPUGPU

OpenAI

Apr 20

3P Architect

Defines rack- and cluster-level reference architectures for AI infrastructure, translates workload requirements into designs, collaborates with partners and modeling teams to evaluate tradeoffs, and drives vendor roadmaps to address technology gaps.

342k – 555kSan Francisco, CA +1DevOps / SREHybridODMJDM

OpenAI

Apr 20

Performance Modeling Engineer ~2

Develop and maintain performance modeling tools to analyze AI system behavior, evaluate tradeoffs in compute, memory, networking, and storage. Requires 1-2 years experience in software engineering or systems analysis, strong programming, and analytical skills.

266k – 445kSan Francisco, CA +1DevOps / SREHybridPythonNetworking

OpenAI

Apr 20

Performance Modeling Engineer

Develops and maintains performance modeling tools and frameworks to evaluate AI system behavior, analyze tradeoffs in compute, memory, networking, and storage. Collaborates with architects on simulations and insights for infrastructure design; requires strong software/modeling background and system architecture knowledge.

266k – 445kSan Francisco, CA +1DevOps / SREHybridC++Python

OpenAI

Apr 17

Software Engineer, Engineering Acceleration | Consumer Devices

Builds and operates CI/CD systems, developer workflows, and internal platforms to accelerate engineering velocity for consumer device software across device and cloud. Requires 7+ years experience with deep CI/CD and platform expertise.

230k – 342kSan Francisco, CADevOps / SREHybridCI/CDBazel

OpenAI

Apr 17

Software Engineer, Kernel Performance & AI Tooling

Develops kernel performance optimizations, AI-assisted tooling, and observability infrastructure for AI-native hardware. Requires strong low-level systems experience, kernel/accelerator expertise, and familiarity with AI workflows for engineering acceleration.

266k – 445kSan Francisco, CADevOps / SREHybridGPUsCPUs

OpenAI

Apr 15

Software Engineer, Infrastructure, Consumer Devices

Designs and builds scalable cloud infrastructure platforms powering OpenAI's consumer products, focusing on Kubernetes orchestration, reliability, and growth. Requires 8+ years experience leading large-scale systems with strong systems thinking.

325k – 440kSan Francisco, CADevOps / SREHybridAWSGCP

OpenAI

Apr 15

ChatGPT Performance Engineer

Performance Engineer optimizes infrastructure and application performance for ChatGPT and OpenAI API, focusing on latency, throughput, and efficiency at scale. Requires 7+ years in high-scale systems with expertise in profiling, tracing, and cross-layer optimizations.

325k – 405kSan Francisco, CA +2DevOps / SRERemotePythonGolang

Browse all role families