Senior Software Engineer, Data

126k – 189kSeattle, WAData EngineeringOnsite8+ YOEMay 2

Summary

Builds and maintains scalable data pipelines for Semantic Scholar corpus, improves data quality with ML techniques like entity resolution and classification, and designs APIs for AI research agents. Requires 8+ years experience, strong Python/SQL skills, and ML familiarity.

About the role

Your Next Challenge

Improve the coverage and quality of the Semantic Scholar corpus across academic papers, patents, and new domain-specific datasets
Build and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enrichment
Develop and deploy ML models for entity disambiguation, author linking, and topic classification
Design and extend APIs that expose structured scholarly data to academic researchers and AI agent workflows
Contribute to dashboards and tools for evaluating data quality and model precision
Collaborate across engineering and research teams to ensure maintainability, test coverage, and robust deployment

What You’ll Need

Required:

Bachelor's degree and 8+ years of technical experience; relevant experience may substitute for education.
Strong Python engineering skills, especially for building and maintaining data pipelines
Experience with SQL and schema design in production settings (PostgreSQL preferred)
Familiarity with ML workflows (training classifiers, tuning models, deploying for inference), particularly for large-scale or ambiguous structured datasets
Comfortable working with structured data formats (XML/JSON/Parquet) and writing ETL code
Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (AWS, S3, Docker)
Strong communicator and a strong sense of ownership for results

Preferred:

Experience with author disambiguation, entity resolution, or record linkage problems
Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale
Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO)
Familiarity with building APIs or data services consumed by automated or agent-based workflows

Compensation: Base salary range $126,000 - $189,000, plus generous bonus plans.

Skills

PythonSQLPostgreSQLAirflowAWSS3DockerMLETLParquetJSONXMLentity resolutionvector similaritytopic modeling

Similar roles at this salary range

All Data Engineering jobs →

Apartment List

Jun 11

Lead Analytics Engineer

Lead Analytics Engineer responsible for shaping data architecture, mentoring the team, and delivering end-to-end data solutions that power decisions across Product, Marketing, Operations, and Finance. Requires 7+ years experience, expert SQL, advanced dbt, and proven data architecture impact.

141k – 200kUnited StatesData EngineeringRemote7+ YOESQLdbt

Mariana Minerals

Jun 10

Sr. Data Engineer

Senior Data Engineer owning end-to-end data domains for industrial plant operations. Designs pipelines, schemas, and contracts from messy sensor/lab sources to support ML and operational decisions.

140k – 180kAnn Arbor, MI +2Data EngineeringOn-site4+ YOESQLCI/CD

Apartment List

Jun 9

Senior Data Engineer

Senior Data Engineer responsible for designing, building, and operating scalable data pipelines and workflows using Airflow, BigQuery, and dbt to support analytics and decision-making.

126k – 180kUnited StatesData EngineeringRemote5+ YOEDBTETL

Loop Financial

Jun 8

Analytics Engineer

Build and own core data models, ETL pipelines, and analytics infrastructure to enable data-driven decisions across the company and clients. Requires 2+ years building analytical products, strong SQL/Python, and modern data stack experience.

135k – 155kChicago, ILData EngineeringOn-site2+ YOESQLdbt

Turquoise Health

Jun 4

Data Science Engineer, Analytics

Build data pipelines, models, dashboards, and analyses to support product and business decision-making. Requires 2+ years of Python/SQL experience with data modeling, ETL tools, and AWS.

145k – 160kSan Diego, CAData EngineeringRemote2+ YOESQLdbt

Apply