Skip to content

Software Engineer, Productivity - Model Performance

230k – 385kSan Francisco, CAOnsite
Summary

Builds and improves developer tools, CI/CD pipelines, and testing workflows to boost productivity for OpenAI's model performance engineering teams. Requires strong Python skills, experience with developer infrastructure, and ability to work in ambiguous environments.

About the role

Responsibilities

  • Improve development workflows for engineers working on model performance infrastructure
  • Design and improve CI/CD, release, validation, and testing pipelines
  • Build and maintain tools that improve reliability, iteration speed, and engineering confidence
  • Partner closely with engineers to identify friction in testing, debugging, deployment, and development workflows
  • Contribute to infrastructure efforts that support performance-critical training and inference systems
  • Help improve developer experience across Python-heavy codebases and performance-oriented infrastructure
  • Work in a high-context, ambiguous environment where ownership and good judgment matter

Requirements

  • Motivated by enabling other engineers and helping them do their best work
  • Strong experience with CI/CD, developer infrastructure, testing systems, tooling, or build/release workflows
  • Highly collaborative, empathetic, and comfortable partnering deeply with technical teams
  • Strong in Python and enjoy building reliable, scalable developer tools and infrastructure
  • Experience improving large-scale engineering workflows, especially around CI reliability, test infrastructure, and debugging velocity
  • Self-directed and comfortable operating with ambiguity
  • Excited to learn model performance domain

Nice-to-haves

  • Experience in the PyTorch ecosystem
  • Experience with C++ or Rust
Skills
PythonCI/CDPyTorchTritonC++Rusttesting frameworksdeveloper toolinginfrastructureCI pipelines
Similar roles at this salary range
All DevOps / SRE jobs →
Crusoe

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit
Stuut

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS
Crusoe

Staff Network Engineer, Operations

Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.

195k – 235kSan Francisco, CADevOps / SREOn-siteBGPQoS
Ditto

Senior Software Engineer, Platform

Lead architecture and implementation of multi-cloud Kubernetes platform across AWS, Azure, and GCP. Own infrastructure provisioning, access management, networking, and lifecycle systems while mentoring engineers and defining org-wide standards.

185k – 305kUnited StatesDevOps / SRERemoteAWSGCP
Snowflake

Senior Software Engineer - Internal Observability

Senior engineer building AI-powered observability systems and large-scale telemetry pipelines for Snowflake's multi-cloud data platform. Requires 7+ years focused on distributed systems and cloud services.

200k – 288kMenlo Park, CADevOps / SREOn-siteC++AWS