Software Engineer, Distributed Data Systems (Sora)
Designs and scales distributed data infrastructure for large-scale multimodal training and evaluation at OpenAI. Collaborates with researchers to build reliable, high-performance systems handling massive data volumes in a fast-paced environment.
Responsibilities
- Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security.
- Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient.
- Partner with researchers to deeply understand requirements and translate them into production-ready systems.
- Harden, optimize, and maintain critical data infrastructure systems that power multimodal training and evaluation.
Requirements
- Strong experience with distributed systems and large-scale infrastructure with a strong interest in data.
- Detail-oriented and bring rigor to building and maintaining reliable systems.
- Excellent software engineering fundamentals and organizational skills.
- Comfortable with ambiguity and rapid change.
Staff Data Platform Engineer
Staff Data Platform Engineer building and leading AWS-native data platform architecture, orchestration, governance, and AI-readiness for analytics and ML workloads. Requires 8-10+ years experience with AWS data systems and strong technical leadership.
Manager, Data Engineering
Lead and mentor a team of data engineers building scalable data pipelines and platform infrastructure. Hands-on coding, operational excellence, and cross-functional collaboration with analytics, data science, and business teams.
Senior Software Engineer, Events Analytics Platform
Senior backend/infrastructure engineer expanding Sentry's time-series data platform (Snuba/ClickHouse) to handle petabyte-scale events with sub-second latency. Requires 4+ years experience and distributed storage expertise.