Software Engineer, Data Infrastructure
New York, NYSan Francisco, CAData EngineeringRemote4+ YOE
Summary
Builds and maintains petabyte-scale data storage infrastructure for AI training workloads. Requires 4+ years in data infrastructure, Python, Kubernetes, and distributed processing frameworks like Spark or Beam.
About the role
Responsibilities
- Work directly on petabyte-scale storage infrastructure, and the networking and performance challenges that come with it.
- Collaborate daily with researchers and engineers.
Requirements
- 4+ years of experience working on data storage infrastructure
- Strong command of Python
- Kubernetes experience, especially on the storage side (Persistent Volumes, CSI drivers, etc.)
- Ability to transform unstructured data into performant datasets across diverse storage backends including S3, GCS, and POSIX
- Experience with distributed data processing frameworks such as Apache Beam, Spark, or Flink
Nice-to-Haves
- Familiarity with modern analytics tooling such as BigQuery, Airflow, or dbt
Skills
PythonKubernetesS3GCSPOSIXApache BeamSparkFlinkBigQueryAirflow