Data Engineer
New York, NYSan Francisco, CAData EngineeringRemote5+ YOE
Summary
Builds and maintains production-grade data processing systems and storage infrastructure for advanced AI platforms. Requires 5+ years experience with Python, SQL, distributed frameworks like Spark/Beam/Flink, and cloud storage like S3/GCS.
About the role
Responsibilities
- Work directly on storage infrastructure, product launches, and new customer experiences built on one of the most advanced AI systems in the world
- Collaborate daily with researchers and engineers
- Run implementations end-to-end and see initiatives through to real outcomes
- Partner across research, marketing, sales, and finance to help define how Cohere grows, with your recommendations feeding directly into products and strategy
Requirements
- 5+ years of experience working on production-grade data processing systems
- Strong command of Python and SQL
- Experience with distributed data processing frameworks such as Apache Beam, Spark, or Flink
- The ability to transform unstructured data into performant datasets across diverse storage backends including S3, GCS, and POSIX
Nice-to-haves
- Experience with modern orchestration platforms, especially Kubernetes
- Familiarity with modern analytics stack tooling such as BigQuery, Airflow, or dbt
- Knowledge of Java or Golang
- Genuine excitement about AI
Skills
PythonSQLApache BeamSparkFlinkS3GCSKubernetesBigQueryAirflow