Member of Technical Staff - Large Scale Data Infrastructure

Builds scalable data infrastructure for peta-to-exabyte scale training on thousands of GPUs, including data loaders, petabyte storage systems, multi-cloud abstractions, and performance debugging for AI models.

180k – 300kSan Francisco, CAData EngineeringHybrid

Apply

About the role

What You’ll Work On

Scalable data loaders for training runs across thousands of GPUs
Efficient storage and retrieval systems for petabyte-scale datasets
Multi-cloud object storage abstraction
Execute large-scale data migrations across storage systems and providers
Debug and resolve performance bottlenecks in distributed data loading

Technical Focus

Python, PyTorch DataLoader internals
Object storage (e.g. S3, Azure Blob, GCS)
Parquet for metadata
Video: ffmpeg, PyAV, codec fundamentals

What We’re Looking For

Built and operated data pipelines at petabyte scale
Optimized data loading
Worked with petabyte-scale video and image datasets
Written processing jobs operating on millions of files
Debugged distributed system bottlenecks across large fleets of machines

Nice to have

Experience streaming dataset formats (e.g. WebDataset)
Video codec internals and frame-accurate seeking
Distributed systems experience
Slurm and Kubernetes for job orchestration
Experience with object storage performance tuning across providers

Base Annual Salary (SF based role): $180,000–$300,000 USD + Equity

Skills

PythonPyTorchS3GcsAzure BlobParquetFfmpegPyavKubernetesSlurmWebdataset

Similar roles

Data Engineering jobs

xAI

Member of Technical Staff - Pre-Training

Designs and implements petabyte-scale data processing systems and pipelines for pre-training large language models, focusing on high-throughput CPU/GPU processing, data quality, and multi-cloud management. Requires strong systems skills in distributed data systems.

180k – 440kPalo Alto, CAData EngineeringOn-siteLLMsKubernetes

Shield AI

Senior Staff Engineer, Operations Analysis (R4487)

Leads modeling, simulation, and wargaming to evaluate autonomous aircraft performance, survivability, and mission impact in military scenarios. Collaborates with engineering and DoD stakeholders using tools like AFSIM, STK, MATLAB, and Python; requires 10+ years experience and security clearance.

181k – 271kWashington, DCData EngineeringOn-site10+ YOEStkIsr

Komodo Health

Staff Data Engineer

Staff Data Engineer architects and delivers scalable data products from healthcare datasets, designs high-performance processing systems using SQL, Spark, Python, and AI workflows, and leads cross-functional initiatives for reliable data serving to customers and applications.

181k – 282kUnited StatesData EngineeringRemoteSQLC++

Staff Software Engineer, Batch Processing Platform

Designs, implements, and optimizes high-performance batch processing infrastructure handling hundreds of petabytes using Spark, Presto/Trino, and Iceberg. Requires 6+ years in scalable big data systems and proficiency in Java, Scala, or Python.

177k – 365kSeattle, WAData EngineeringRemote6+ YOEJavaTrino

NexHealth

Staff Data Engineer

Staff Data Engineer owns and evolves data platforms including warehouse architecture, pipelines, and modeling to enable scalable analytics and self-service insights. Requires 7+ years experience, advanced SQL/Python, and expertise with managed data warehouses like Snowflake.

177k – 226kSan Francisco, CAData EngineeringOn-site7+ YOESQLETL