Skip to content

Software Engineer, Distributed Data Systems (Sora)

230k – 385kSan Francisco, CAHybrid
Summary

Designs and scales distributed data infrastructure for large-scale multimodal training and evaluation at OpenAI. Collaborates with researchers to build reliable, high-performance systems handling massive data volumes in a fast-paced environment.

About the role

Responsibilities

  • Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security.
  • Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient.
  • Partner with researchers to deeply understand requirements and translate them into production-ready systems.
  • Harden, optimize, and maintain critical data infrastructure systems that power multimodal training and evaluation.

Requirements

  • Strong experience with distributed systems and large-scale infrastructure with a strong interest in data.
  • Detail-oriented and bring rigor to building and maintaining reliable systems.
  • Excellent software engineering fundamentals and organizational skills.
  • Comfortable with ambiguity and rapid change.
Skills
distributed systemsdata orchestrationdistributed storagestreaming infrastructuremachine learning infrastructureKubernetesApache SparkApache KafkaAWSGoogle Cloud
Similar roles at this salary range
All Data Engineering jobs →
Honor

Staff Data Platform Engineer

Staff Data Platform Engineer building and leading AWS-native data platform architecture, orchestration, governance, and AI-readiness for analytics and ML workloads. Requires 8-10+ years experience with AWS data systems and strong technical leadership.

194k – 220kUnited StatesData EngineeringRemotedbtPython
Justworks

Manager, Data Engineering

Lead and mentor a team of data engineers building scalable data pipelines and platform infrastructure. Hands-on coding, operational excellence, and cross-functional collaboration with analytics, data science, and business teams.

205k – 262kNew York, NYData EngineeringHybridSQLAWS
Nuance Labs

Member of Technical Staff — ML Data Infra

Build and operate large-scale multimodal data pipelines for AI avatar model training. Design production-grade systems for petabyte-scale video, audio, and text data.

200k – 300kSeattle, WAData EngineeringOn-siteRayDVC
Jump

Data Platform Lead

Own end-to-end data platform strategy and lead the data engineering team. Build scalable multi-tenant infrastructure, AI-on-data capabilities, and productized integrations for sports analytics clients.

210k – 210kLos Angeles, CAData EngineeringRemotedbtAWS
Sentry

Senior Software Engineer, Events Analytics Platform

Senior backend/infrastructure engineer expanding Sentry's time-series data platform (Snuba/ClickHouse) to handle petabyte-scale events with sub-second latency. Requires 4+ years experience and distributed storage expertise.

190k – 280kSan Francisco, CAData EngineeringHybridRedisKafka