Skip to content

Software Engineer, Data Infrastructure

New York, NYSan Francisco, CAData EngineeringRemote4+ YOE
Summary

Builds and maintains petabyte-scale data storage infrastructure for AI training workloads. Requires 4+ years in data infrastructure, Python, Kubernetes, and distributed processing frameworks like Spark or Beam.

About the role

Responsibilities

  • Work directly on petabyte-scale storage infrastructure, and the networking and performance challenges that come with it.
  • Collaborate daily with researchers and engineers.

Requirements

  • 4+ years of experience working on data storage infrastructure
  • Strong command of Python
  • Kubernetes experience, especially on the storage side (Persistent Volumes, CSI drivers, etc.)
  • Ability to transform unstructured data into performant datasets across diverse storage backends including S3, GCS, and POSIX
  • Experience with distributed data processing frameworks such as Apache Beam, Spark, or Flink

Nice-to-Haves

  • Familiarity with modern analytics tooling such as BigQuery, Airflow, or dbt
Skills
PythonKubernetesS3GCSPOSIXApache BeamSparkFlinkBigQueryAirflow