Data Engineer

New York, NYSan Francisco, CAData EngineeringRemote5+ YOEApr 15

Summary

Builds and maintains production-grade data processing systems and storage infrastructure for advanced AI platforms. Requires 5+ years experience with Python, SQL, distributed frameworks like Spark/Beam/Flink, and cloud storage like S3/GCS.

About the role

Responsibilities

Work directly on storage infrastructure, product launches, and new customer experiences built on one of the most advanced AI systems in the world
Collaborate daily with researchers and engineers
Run implementations end-to-end and see initiatives through to real outcomes
Partner across research, marketing, sales, and finance to help define how Cohere grows, with your recommendations feeding directly into products and strategy

Requirements

5+ years of experience working on production-grade data processing systems
Strong command of Python and SQL
Experience with distributed data processing frameworks such as Apache Beam, Spark, or Flink
The ability to transform unstructured data into performant datasets across diverse storage backends including S3, GCS, and POSIX

Nice-to-haves

Experience with modern orchestration platforms, especially Kubernetes
Familiarity with modern analytics stack tooling such as BigQuery, Airflow, or dbt
Knowledge of Java or Golang
Genuine excitement about AI

Skills

PythonSQLApache BeamSparkFlinkS3GCSKubernetesBigQueryAirflow