Skip to content

Compute Optimization Researcher/Engineer

293k – 455kSan Francisco, CASeattle, WAHybrid5+ YOE
Summary

Develops optimization models, forecasting frameworks, and planning systems to maximize compute capacity utilization across GPU clusters, data centers, and cloud providers. Requires PhD and 5+ years in optimization or infrastructure planning with strong Python and solver expertise.

About the role

Responsibilities

  • Build optimization models for compute allocation, workload scheduling, and cluster utilization.
  • Develop planning systems that balance supply, demand, cost, latency, and reliability constraints.
  • Create forecasting frameworks for GPU demand, infrastructure growth, and capacity needs.
  • Design decision tools for allocating compute across internal teams, products, and strategic priorities.
  • Partner with architecture, infrastructure engineering, finance, and operations teams to translate business needs into mathematical models.
  • Integrate multiple operational data sources into planning systems and optimization workflows.
  • Improve utilization of GPUs, networking, power, cooling, and storage infrastructure.
  • Analyze tradeoffs across first-party data centers, cloud providers, and hybrid environments.
  • Build dashboards, metrics, and operational tooling for capacity decision-making.
  • Lead ambiguous, cross-functional initiatives that improve infrastructure efficiency at scale.
  • Present recommendations clearly to technical leaders and executives.
  • Continuously refine models based on changing workloads, supply constraints, and business priorities.

Requirements

  • Doctorate degree in Computer Science, Engineering, Mathematics, Operations Research, Economics, or related field.
  • 5+ years of experience in optimization, planning, infrastructure analytics, or systems engineering.
  • Strong experience with linear programming, mixed-integer optimization, convex optimization, simulation, or forecasting methods.
  • Proficiency in Python and data tooling (SQL, Pandas, Spark, etc.).
  • Experience translating real-world business constraints into scalable optimization systems.
  • Strong analytical problem-solving skills with comfort operating in ambiguous environments.
  • Ability to influence cross-functional stakeholders without formal authority.
  • Excellent communication skills with both technical and non-technical audiences.

Preferred Qualifications

  • Experience with large-scale infrastructure, cloud capacity planning, or data center operations.
  • Familiarity with tools such as Gurobi, CPLEX, CVXPY, Pyomo, or similar solvers.
  • Experience optimizing GPU fleets, networking systems, or distributed compute environments.
  • Background in supply-demand planning, logistics, marketplace optimization, or resource scheduling.
  • Experience working in fast-scaling technology environments.
Skills
PythonSQLPandasSparklinear programmingmixed-integer optimizationconvex optimizationGurobiCPLEXCVXPY
Similar roles at this salary range
All DevOps / SRE jobs →
Onebrief

Principal Infrastructure Engineer

Principal Infrastructure Engineer building and operating secure cloud-native and edge platforms for military collaboration software. Requires 8+ years production infrastructure experience, deep Kubernetes expertise, and ability to obtain SECRET clearance.

235k – 275kUnited StatesDevOps / SRERemoteGoAWS
Sentry

Staff Software Engineer, AI Developer Tooling

Own AI-assisted coding tooling at Sentry. Build harnesses, context systems, and API integrations so AI agents can operate across the full software development lifecycle.

240k – 320kSan Francisco, CADevOps / SREHybridCI/CDPython
Together AI

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

Design and operate multi-petabyte distributed storage systems for large-scale AI training and inference, integrating parallel filesystems and building Kubernetes-native storage platforms.

250k – 300kSan Francisco, CADevOps / SREOn-siteGoCeph
Forge

Director of Platform & Reliability Engineering

The Director of Platform & Reliability Engineering will lead an engineering organization responsible for secure, scalable, and highly reliable products. This role involves setting the vision for internal platforms, cloud infrastructure, developer enablement, and production operations.

235k – 245kSan Francisco, CADevOps / SREHybridCI/CDKubernetes
Anthropic

Staff Software Engineer, Infrastructure Asset Systems

As a Staff Software Engineer, you will build and extend systems for tracking, governing, and reporting on infrastructure assets. This involves designing data models, workflow engines, and integrations with financial and procurement systems, ensuring compliance and auditability.

320k – 405kSan Francisco, CA +1DevOps / SREHybridGoSQL