Software Engineer, Data Infrastructure - Research
Designs and implements dataset infrastructure for OpenAI's large-scale LLM training stack, including standardized APIs for multimodal data, scaling pipelines across GPU fleets, and performance debugging. Requires strong distributed systems experience and collaboration with researchers.
Responsibilities
- Design and maintain standardized dataset APIs, including for multimodal (MM) data that cannot fit in memory.
- Build proactive testing and scale validation pipelines for dataset loading at GPU scale.
- Collaborate with teammates to integrate datasets seamlessly into training and inference pipelines, ensuring smooth adoption and a great user experience.
- Document and maintain dataset interfaces so they are discoverable, consistent, and easy for other teams to adopt.
- Establish safeguards and validation systems to ensure datasets remain reproducible and unchanged once standardized.
- Debug and resolve performance bottlenecks in distributed dataset loading (e.g., straggler systems slowing global training).
- Provide visualization and inspection tools to surface errors, bugs, or bottlenecks in datasets.
Requirements
- Strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure.
- Experience building APIs, modular code, and scalable abstractions, while recognizing that abstractions ultimately serve the users and UX is an important part of the abstractions design.
- Comfortable debugging bottlenecks across large fleets of machines.
- Take pride in building infrastructure that “just works,” and find joy in being the guardian of reliability and scale.
- Collaborative, humble, and excited to own a foundational (if not glamorous) part of the ML stack.
Nice-to-Haves
- Background knowledge in data math, probability, or distributed data theory.
- Worked with GPU-scale distributed systems or dataset scaling for real-time data.
Manager, Data Engineering
Lead and mentor a team of data engineers building scalable data pipelines and platform infrastructure. Hands-on coding, operational excellence, and cross-functional collaboration with analytics, data science, and business teams.
Staff Analytics Engineer
CodeRabbit is seeking a Staff Analytics Engineer to build and own their BigQuery and dbt data foundation. This role involves architecting the data warehouse, defining key metrics, building revenue models, and developing GTM intelligence layers.
Staff Data Engineer, Ads
Discord is seeking a Staff Data Engineer to lead technical vision and strategy for ads data infrastructure. This role involves building and maintaining sophisticated data pipelines, datasets, and analytical tools, and mentoring other engineers.