Software Engineer, Data Infrastructure

New York, NYSan Francisco, CAData EngineeringRemote4+ YOEMay 14

Summary

Builds and maintains petabyte-scale data storage infrastructure for AI training workloads. Requires 4+ years in data infrastructure, Python, Kubernetes, and distributed processing frameworks like Spark or Beam.

About the role

Responsibilities

Work directly on petabyte-scale storage infrastructure, and the networking and performance challenges that come with it.
Collaborate daily with researchers and engineers.

Requirements

4+ years of experience working on data storage infrastructure
Strong command of Python
Kubernetes experience, especially on the storage side (Persistent Volumes, CSI drivers, etc.)
Ability to transform unstructured data into performant datasets across diverse storage backends including S3, GCS, and POSIX
Experience with distributed data processing frameworks such as Apache Beam, Spark, or Flink

Nice-to-Haves

Familiarity with modern analytics tooling such as BigQuery, Airflow, or dbt

Skills

PythonKubernetesS3GCSPOSIXApache BeamSparkFlinkBigQueryAirflow