Key Responsibilities

Build & Maintain Data Pipelines: Design, implement, and maintain scalable data pipelines using modern data tools to process and manage large datasets efficiently.
ETL Development: Develop and optimize ETL/ELT pipelines to ingest, transform, and deliver data from multiple internal and external sources.
Workflow Orchestration: Build and manage workflows using Apache Airflow to ensure reliable scheduling and monitoring of data processes.
Query Engines & Processing Frameworks: Leverage tools such as Trino (Presto), Apache Spark, and related distributed processing technologies to support analytics and data applications.
Data Modeling & Warehousing: Contribute to schema design and data modeling efforts to ensure clean, well-structured, and scalable data architecture.
Data Quality & Governance Support: Implement monitoring, validation checks, and best practices to ensure data accuracy, consistency, and reliability.
Optimize Data Infrastructure: Utilize AWS services (S3, Redshift, Glue, Athena, Lambda) and modern data technologies (e.g., Apache Iceberg) to support a scalable and efficient data platform.
Cross-Functional Collaboration: Partner with engineering, product, analytics, and business teams to understand requirements and deliver high-quality data solutions.
Monitor & Improve Performance: Proactively monitor pipelines and workflows, troubleshoot issues, and continuously improve performance and reliability.

Qualifications

5+ years of experience building and maintaining data pipelines or working in data engineering or related roles.
Hands-on experience with data tools such as Apache Airflow, Apache Spark, Apache Iceberg, Trino/Presto, and AWS services (S3, Redshift, Glue, Athena, Lambda).
Proficiency in Python (or similar language) for data processing and pipeline development.
Solid understanding of data warehousing concepts, schema design, and data modeling best practices.
Experience deploying and supporting data pipelines in production environments.
Strong analytical skills and ability to diagnose and resolve data-related issues.
Ability to communicate effectively with both technical and non-technical stakeholders and work in cross-functional teams.