Data Engineer

Design, build, and maintain data pipelines for biomedical and clinical research datasets. Work with scientists and researchers to deliver accessible, well-governed data products using Python, SQL, and ETL/ELT processes.

Rockville, MDData EngineeringOnsite

Apply

About the role

Key Responsibilities

Data Pipeline Development

Design, build, test, and maintain data pipelines to ingest, transform, harmonize, and integrate diverse biomedical and research data sources, including clinical, genomic, experimental, imaging, biospecimen, operational, and other scientific datasets
Develop reusable transformation logic and curated datasets that support analytics, reporting, dashboards, applications, APIs, and downstream research workflows

Data Integration and Lifecycle Support

Support the full research data lifecycle by enabling reliable data movement from source systems and storage environments into structured, analysis-ready formats
Assist with data ingestion, curation, metadata capture, data refreshes, source-to-target mapping, schema management, and long-term maintainability of data products and workflows

Collaboration

Work closely with data scientists, bioinformaticians, researchers, application developers, project managers, and government stakeholders to gather requirements and deliver practical data solutions
Translate scientific and operational data needs into technical specifications, data models, transformation logic, and reusable datasets

Quality & Governance

Implement data validation checks, reconciliation routines, testing practices, and monitoring processes to ensure data accuracy, completeness, consistency, and integrity
Follow data governance and security best practices, including documentation of transformations, lineage, assumptions, access requirements, and compliance considerations

Dashboarding & Integration

Create or support interactive dashboards, reporting layers, APIs, and application-ready datasets
Support integration between data pipelines, databases, cloud platforms, analytics environments, and approved application platforms

Operational Support and Modernization

Troubleshoot data pipeline failures, source system inconsistencies, data quality issues, schema changes, access issues, and performance bottlenecks
Contribute to modernization efforts by improving automation, documentation, scalability, reproducibility, and platform readiness

Required Qualifications

Bachelor's degree in Computer Science, Data Science, Bioinformatics, Biomedical Informatics, Information Systems, Engineering, or a related field, or equivalent practical experience
Proven experience as a Data Engineer, Analytics Engineer, Data Integration Developer, Bioinformatics Engineer, or similar data-intensive role
Strong proficiency in Python and SQL for data manipulation, transformation, scripting, automation, and analysis
Hands-on experience building ETL/ELT processes and data pipelines to support large, complex, multi-source datasets
Familiarity with scalable data processing approaches, including Spark/PySpark or similar frameworks
Solid understanding of data modeling, relational databases, data warehouses, data lakes, metadata, and database concepts
Ability to work with complex, multi-modal datasets, including structured, semi-structured, and unstructured data
Knowledge of software engineering and data engineering best practices, including version control using Git, code review, automated testing, documentation, peer review, and change management
Experience ensuring data quality and using lineage, provenance tracking, audit trails, or documentation practices
Excellent problem-solving skills and the ability to communicate effectively with both technical and non-technical stakeholders
Strong interest in biomedical science, clinical research, healthcare data, and scientific discovery
Demonstrated awareness of sensitive data handling, privacy, access control, data governance, and regulatory or compliance expectations

Preferred Qualifications

Hands-on experience building data solutions in modern data platforms or platform-as-a-service environments such as Snowflake, Databricks, Palantir, cloud data warehouses, data lakes, or similar platforms
Experience supporting integrations across databases, cloud storage, APIs, analytics platforms, dashboards, and application environments
Experience preparing curated datasets for dashboards, APIs, web applications, reporting tools, notebooks, or scientific computing environments
Familiarity with research-facing tools and platforms such as Posit Connect, R/Shiny, Streamlit, Jupyter, Galaxy, Code

Skills

PythonSQLETLELTSparkPysparkGitSnowflakeDatabricksData Modeling

Similar roles

Data Engineering jobs

Cursor

Software Engineer, Storage

Software Engineer on the Storage team owning the data layer (databases, caches, scaling strategies) that underpins all Cursor products. Design multi-database architectures, build query guardrails, define storage best practices, and own cache infrastructure for reliability and growth.

San Francisco, CA +1Data EngineeringOn-site5+ YOEOltpMySQL

Machinify

Healthcare Data Analyst

Create advanced SQL/Spark SQL queries and prompt-engineered LLM workflows to transform healthcare claims data into clinical insights and automated policy tools. Requires 3-5 years SQL plus 2-3 years healthcare experience.

140k – 170kUnited StatesData EngineeringRemote3+ YOESQLClaude

Coinbase

Analytics Engineer

Build and maintain data models, pipelines, and dashboards that power customer experience and compliance operations. Partner with CX and compliance teams to deliver trusted, self-serve analytics.

152k – 179kUnited StatesData EngineeringRemote3+ YOESQLdbt

Rad AI

Data Engineer

Senior Data Engineer building scalable data pipelines and infrastructure on AWS using Spark, Metaflow, and container orchestration. Requires 5+ years of experience designing distributed data systems.

145k – 190kUnited StatesData EngineeringRemote5+ YOEAWSSQL

Mach9

Software Engineer, Sensor Integration

Build and maintain ingestion pipelines that convert large-scale geospatial sensor data (LiDAR, imagery) into standardized formats for ML training and product use. Requires strong Python skills, comfort with undocumented formats, and distributed systems experience.

San Francisco, CAData EngineeringHybridC++Gdal