Senior Software Engineer, Data Platform
United StatesRemote5+ YOE
Summary
Design, build, and scale the company's data infrastructure on GCP using Spark, Databricks, and Fivetran. Own ETL/ELT pipelines, CDC, data governance, and streaming/batch data flows serving analytics, product, and compliance use cases.
About the role
Key Responsibilities
- Contribute to the design, development, and scaling of core data infrastructure using GCP, Spark, Databricks, and Fivetran
- Develop robust and maintainable ETL/ELT workflows that support diverse structured and unstructured data needs
- Implement and manage Change Data Capture (CDC) pipelines to enable near real-time data replication and synchronization
- Define and enforce data governance and compliance standards, including access control, auditability, lineage, and metadata management
- Build and manage streaming and batch data pipelines to serve high-impact use cases across analytics, product, compliance, and experimentation
- Act as a strategic partner to cross-functional teams (product, analytics, engineering, clinical) to ensure data is accessible, trustworthy, and impactful
- Drive the long-term architectural vision of the data platform to support current and future business and product needs
Requirements
- 5+ years of experience in software engineering, with a focus on scalable data architectures
- Strong expertise in GCP (IAM, GCS, Pub/Sub, etc.) and hands-on experience with Spark and Databricks
- Hands-on experience with CDC technologies like Fivetran, or equivalent
- Proficiency in ETL/ELT tools and frameworks (dbt, Apache Airflow, Dataform, etc.)
- Deep understanding of data governance principles, including compliance and security best practices
- Demonstrated success in collaborating across functions to deliver data solutions for analytics, experimentation, or compliance
- Balance of IC execution and leadership skills; comfortable rolling up sleeves or mentoring others
- Familiarity with streaming data architecture, real-time ingestion, and delivery frameworks
- Proficient in SQL and Python for data processing and automation
- Strong problem-solving skills with the ability to work in a fast-paced environment
- Excellent communication and technical storytelling skills
Nice-to-Haves
- Experience with Terraform or Infrastructure-as-Code (IaC) for data infrastructure automation
- Background in HIPAA or other regulated environments with sensitivity to data privacy and compliance
- Familiarity with the dbt Semantic Layer and modern data modeling best practices
- Exposure to data observability platforms and practices
- Familiarity with machine learning data pipelines
- Exposure to multi-cloud or hybrid-cloud environments
- Experience building scalable solutions in a 0-1 environment
Skills
GCPGoogle CloudSparkDatabricksFivetranChange Data CaptureCDCETLELTdbtApache AirflowDataformSQLPythonTerraform