Skip to content

Data Engineer

Senior Data Engineer building scalable data pipelines and infrastructure on AWS using Spark, Metaflow, and container orchestration. Requires 5+ years of experience designing distributed data systems.

145k – 190kUnited StatesData EngineeringRemote5+ YOE

About the role

What You’ll Be Doing

  • Design and implement the data architecture, ensuring scalability, flexibility, and efficiency using pipeline authoring tools like Metaflow and large-scale data processing technologies like Spark.
  • Define and extend our internal standards for style, maintenance, and best practices for a high-scale data platform.
  • Collaborate with researchers and other stakeholders to understand their data needs including model training and production monitoring systems and develop solutions that meet those requirements.
  • Take ownership of key data engineering projects and work independently to design, develop, and maintain high-quality data solutions.
  • Ensure data quality, integrity, and security by implementing robust data validation, monitoring, and access controls.
  • Evaluate and recommend data technologies and tools to improve the efficiency and effectiveness of the data engineering process.
  • Continuously monitor, maintain, and improve the performance and stability of the data infrastructure.

Who We’re Looking For

  • 5+ years relevant experience in data engineering.
  • Expertise in designing and developing distributed data pipelines using big data technologies on large scale data sets.
  • Deep and hands-on experience designing, planning, productionizing, maintaining and documenting reliable and scalable data infrastructure and data products in complex environments.
  • Solid experience with big data processing and analytics on AWS, using services such as Amazon EMR and AWS Batch.
  • Experience in large scale data processing technologies such as Spark.
  • Expertise in orchestrating workflows using tools like Metaflow.
  • Experience with various database technologies including SQL, NoSQL databases (e.g., AWS DynamoDB, ElasticSearch, Postgresql).
  • Hands-on experience with containerization technologies, such as Docker and Kubernetes.
  • Prior Software Engineering experience is a big plus.

Nice to Haves

  • Experience working at an early stage startup.
  • Experience in a HIPAA compliant environment.
  • Experience working on machine learning or healthcare related projects.

Skills

SparkMetaflowAWSAmazon EmrAws BatchSQLNoSQLDynamoDBElasticsearchPostgresDockerKubernetes

GTM Analytics Engineer

Builds and refines ICP models, automates RevOps processes, and architects GTM data integrations using SQL, Python, and tools like Snowflake and Salesforce. Requires 3-5+ years experience driving revenue analytics.

145k – 170kUnited StatesData EngineeringRemote3+ YOESQLn8n

Analytics Engineer

Builds and maintains semantic data layer using dbt and SQL in BigQuery for business metrics, dashboards, and reverse ETL. Owns orchestration, compliance (HIPAA), and data quality in healthcare tech environment. Requires 3+ years experience.

145k – 190kNew York, NY +1Data EngineeringHybrid3+ YOESQLdbt

Healthcare Data Analyst

Create advanced SQL/Spark SQL queries and prompt-engineered LLM workflows to transform healthcare claims data into clinical insights and automated policy tools. Requires 3-5 years SQL plus 2-3 years healthcare experience.

140k – 170kUnited StatesData EngineeringRemote3+ YOESQLClaude

Analytics Engineer

Build and maintain Confido's centralized data warehouse and analytics infrastructure. Design scalable data models, establish data standards, and enable self-service analytics across the organization.

150k – 190kNew York, NYData EngineeringOn-siteSQLdbt

Research Engineer, Data

Research Engineers build data systems and pipelines that power reliable AI workflows for enterprise customers. They design evaluation frameworks, develop data quality systems, and collaborate with researchers and engineers to turn frontier AI concepts into production-ready solutions.

150k – 250kSan Francisco, CA +1Data EngineeringHybridSQLPython