OpenAI Data Engineering Jobs
Open data engineering roles at OpenAI, pulled live from their hiring system.
View data engineering jobs across all companies
67% of open data engineering roles call out Python; Airflow and Kubernetes appear in roughly a third. Most of these data engineering roles are on-site or hybrid; 0% are fully remote.
Technical Lead Manager, Data Engineering, Trust & Safety
Lead and grow the Trust & Safety Data Engineering team, defining roadmap and technical strategy. Build privacy-safe datasets and pipelines for abuse detection, fraud detection, and safety monitoring. Partner with stakeholders to ensure launch readiness and operational rigor.
IT Controls Data Engineer
As an IT Controls Data Engineer, you will build and maintain data infrastructure for audit readiness, IT controls, and continuous control monitoring. This role involves designing pipelines, datasets, and automated validation to ensure reliable control data.
Senior Data Engineer, Core Experimentation
Build and manage data pipelines and canonical datasets for experimentation platform, tracking product metrics like user growth and revenue. Collaborate with cross-functional teams at OpenAI; requires 3+ years data engineering experience with Spark, ETL tools, and distributed systems.
Data Engineer, People Innovation Labs
Build and manage data pipelines for people analytics and internal products like OpenHouse at OpenAI's People Innovation Labs. Collaborate with analytics and engineering teams using Databricks, Spark, and ETL tools; requires 3+ years data engineering experience.
Lead to Opportunity, Data Systems Engineer
Build and integrate Salesforce lead-to-opportunity systems for GTM teams, focusing on data enrichment, workflow automation, and cross-system orchestration to drive pipeline creation at scale. Requires advanced Salesforce development expertise and cross-functional collaboration.
Software Engineer, Research - Human Data
Build full-stack systems, tools, and infrastructure for human feedback collection, AI model alignment, and evaluation. Collaborate with researchers to scale production systems and enhance model safety in a fast-paced environment.
Software Engineer, Distributed Data Systems (Sora)
Designs and scales distributed data infrastructure for large-scale multimodal training and evaluation at OpenAI. Collaborates with researchers to build reliable, high-performance systems handling massive data volumes in a fast-paced environment.
Software Engineer, Data Infrastructure - Research
Designs and implements dataset infrastructure for OpenAI's large-scale LLM training stack, including standardized APIs for multimodal data, scaling pipelines across GPU fleets, and performance debugging. Requires strong distributed systems experience and collaboration with researchers.
Software Engineer, Habitat (Online Data)
Builds and operates Habitat, OpenAI's core online database platform handling high-QPS, latency-sensitive workloads. Owns end-to-end distributed systems for storage, caching, routing, CDC, and privacy; requires 8+ years experience with Rust/Python expertise.
Software Engineer, Data Infrastructure
Builds and operates scalable data infrastructure including compute fleets, storage systems, and streaming platforms to support OpenAI's AI products, research, and analytics. Requires 4+ years in data or infrastructure engineering with expertise in Spark, Kafka, and distributed systems.
Data Engineer, Analytics
Build and manage data pipelines and canonical datasets for product metrics, safety systems, and business decisions. Collaborate with cross-functional teams including Data Science and Research; requires 3+ years data engineering experience with Spark, ETL tools, and distributed systems.
Software Engineer, Data Acquisition
Builds and leads data acquisition systems including web crawling, ingestion, and scalable distributed processing for model training. Requires 4+ years experience, expertise in Kubernetes and large-scale data systems, and BS/MS/PhD in Computer Science.