Software Engineer, Core Network Engineering

230k – 342kSan Francisco, CAOnsiteMay 6

Summary

Builds and operates high-performance networking infrastructure for OpenAI's large-scale AI training and inference, focusing on host networking, datacenter fabrics, and WAN systems. Optimizes latency, reliability, and scalability using technologies like RDMA, InfiniBand, and RoCE; requires strong systems programming in C++, Python, or Go.

About the role

Responsibilities

Design, build, and operate networking systems that support large-scale AI training and inference infrastructure
Improve performance, reliability, and scalability across host networking, datacenter fabrics, and WAN systems
Develop automation for provisioning, configuration management, validation, upgrades, and lifecycle management of networking infrastructure
Build tooling and observability systems for network health, performance analysis, debugging, and automated remediation
Optimize network performance across technologies such as RDMA, RoCE, InfiniBand, Ethernet, and high-performance GPU interconnects
Define and operationalize networking protocols, readiness criteria, and continuous validation systems
Partner closely with compute, storage, hardware, and infrastructure teams to ensure networking scales predictably with fleet growth
Contribute to architecture decisions around topology design, capacity planning, failure domains, and network reliability
Diagnose complex distributed systems and networking issues across large heterogeneous compute environments

Requirements

Experience building or operating large-scale networking or distributed systems infrastructure
Comfortable working close to the hardware/software boundary
Experience with Linux networking, kernel systems, NICs, RDMA, or performance-sensitive infrastructure software
Worked with high-performance networking technologies such as InfiniBand, RoCE, DPDK, or large-scale Ethernet fabrics
Experience with datacenter networking, WAN systems, or host networking stacks
Enjoy debugging complex systems and performance bottlenecks across multiple layers of the stack
Comfortable writing production software in languages such as C++, Python, or Go
Strong systems fundamentals across networking, operating systems, distributed systems, or infrastructure engineering

Skills

Linux networkingRDMAInfiniBandRoCEDPDKC++PythonGoKubernetesEthernet

Similar roles at this salary range

All DevOps / SRE jobs →

Crusoe

Jun 8

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit

Stuut

Jun 8

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS

Crusoe

Jun 5

Staff Network Engineer, Operations

Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.

195k – 235kSan Francisco, CADevOps / SREOn-siteBGPQoS

Ditto

Jun 5

Senior Software Engineer, Platform

Lead architecture and implementation of multi-cloud Kubernetes platform across AWS, Azure, and GCP. Own infrastructure provisioning, access management, networking, and lifecycle systems while mentoring engineers and defining org-wide standards.

185k – 305kUnited StatesDevOps / SRERemoteAWSGCP

Snowflake

Jun 5

Senior Software Engineer - Internal Observability

Senior engineer building AI-powered observability systems and large-scale telemetry pipelines for Snowflake's multi-cloud data platform. Requires 7+ years focused on distributed systems and cloud services.

200k – 288kMenlo Park, CADevOps / SREOn-siteC++AWS

Apply