# Senior Data Engineer 
**Company:** [Machinify](https://hotfix.jobs/companies/machinify)
**Location:** Remote
**Salary:** $180K-$220K
**Experience:** 6+ years
**Skills:** Python, Spark Sql, Airflow, AWS, Kafka, SQS, Parquet, JSON, Csv, Spark
**Posted:** 2026-03-31
> Build and scale production data pipelines using Python, Spark, and Airflow to transform raw healthcare data into trusted datasets powering ML models, dashboards, and customer onboarding. Requires 6+ years experience with strong expertise in data engineering tools and cross-team collaboration.
## Job Description
## What You’ll Do

- Design and implement robust, production-grade pipelines using **Python**, **Spark SQL**, and **Airflow** to process high-volume file-based datasets (CSV, Parquet, JSON).
- Lead efforts to canonicalize raw healthcare data (837 claims, EHR, partner data, flat files) into internal models.
- Own the full lifecycle of core pipelines — from file ingestion to validated, queryable datasets — ensuring high reliability and performance.
- Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs, Account Managers, and Product to ensure successful implementation and troubleshooting.
- Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability.
- Refactor and scale existing pipelines to meet growing data and business needs.
- Tune Spark jobs and optimize distributed processing performance.
- Implement schema enforcement and versioning aligned with internal data standards.
- Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs.
- Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues.
- Contribute to the evolution of our data platform — driving toward mature patterns in observability, testing, and automation.
- Build and enhance streaming pipelines (**Kafka**, **SQS**, or similar) where needed to support near-real-time data needs.
- Help develop and champion internal best practices around pipeline development and data modeling.

## What You Bring

- 6+ years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
- Strong expertise in **Python**, **Spark SQL**, and **Airflow**.
- Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.
- Experience mapping and standardizing raw external data into canonical models.
- Familiarity with **AWS** (or any cloud), including file storage and distributed compute concepts.
- Experience onboarding new customers and integrating external customer data with non-standard formats.
- Ability to work across teams, manage priorities, and own complex data workflows with minimal supervision.
- Strong written and verbal communication skills — able to explain technical concepts to non-engineering partners.
- Comfortable designing pipelines from scratch and improving existing pipelines.
- Experience working with large-scale or messy datasets (healthcare, financial, logs, etc.).
- Experience building or willingness to learn streaming pipelines using tools such as **Kafka** or **SQS**.

**Bonus**: Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).

## What We Offer

- Work from anywhere in the US!
- Full Medical/Dental/Vision for employees & their families.
- Flexible and trusting environment.
- Unlimited FTO.
- Competitive salary, equity, 401(k) including employer match.
- Base salary range: **$180k-$220k**.
**Apply:** https://hotfix.jobs/jobs/senior-data-engineer-at-machinify-4fa22788-e776-497b-acd1-f046dbb129b4
**Canonical:** https://hotfix.jobs/jobs/senior-data-engineer-at-machinify-4fa22788-e776-497b-acd1-f046dbb129b4