Skip to content

Senior Software Engineer, Data

126k – 189kSeattle, WAData EngineeringOnsite8+ YOE
Summary

Builds and maintains scalable data pipelines for Semantic Scholar corpus, improves data quality with ML techniques like entity resolution and classification, and designs APIs for AI research agents. Requires 8+ years experience, strong Python/SQL skills, and ML familiarity.

About the role

Your Next Challenge

  • Improve the coverage and quality of the Semantic Scholar corpus across academic papers, patents, and new domain-specific datasets
  • Build and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enrichment
  • Develop and deploy ML models for entity disambiguation, author linking, and topic classification
  • Design and extend APIs that expose structured scholarly data to academic researchers and AI agent workflows
  • Contribute to dashboards and tools for evaluating data quality and model precision
  • Collaborate across engineering and research teams to ensure maintainability, test coverage, and robust deployment

What You’ll Need

Required:

  • Bachelor's degree and 8+ years of technical experience; relevant experience may substitute for education.
  • Strong Python engineering skills, especially for building and maintaining data pipelines
  • Experience with SQL and schema design in production settings (PostgreSQL preferred)
  • Familiarity with ML workflows (training classifiers, tuning models, deploying for inference), particularly for large-scale or ambiguous structured datasets
  • Comfortable working with structured data formats (XML/JSON/Parquet) and writing ETL code
  • Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (AWS, S3, Docker)
  • Strong communicator and a strong sense of ownership for results

Preferred:

  • Experience with author disambiguation, entity resolution, or record linkage problems
  • Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale
  • Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO)
  • Familiarity with building APIs or data services consumed by automated or agent-based workflows

Compensation: Base salary range $126,000 - $189,000, plus generous bonus plans.

Skills
PythonSQLPostgreSQLAirflowAWSS3DockerMLETLParquetJSONXMLentity resolutionvector similaritytopic modeling
Similar roles at this salary range
All Data Engineering jobs →
Apartment List

Lead Analytics Engineer

Lead Analytics Engineer responsible for shaping data architecture, mentoring the team, and delivering end-to-end data solutions that power decisions across Product, Marketing, Operations, and Finance. Requires 7+ years experience, expert SQL, advanced dbt, and proven data architecture impact.

141k – 200kUnited StatesData EngineeringRemote7+ YOESQLdbt
Mariana Minerals

Sr. Data Engineer

Senior Data Engineer owning end-to-end data domains for industrial plant operations. Designs pipelines, schemas, and contracts from messy sensor/lab sources to support ML and operational decisions.

140k – 180kAnn Arbor, MI +2Data EngineeringOn-site4+ YOESQLCI/CD
Apartment List

Senior Data Engineer

Senior Data Engineer responsible for designing, building, and operating scalable data pipelines and workflows using Airflow, BigQuery, and dbt to support analytics and decision-making.

126k – 180kUnited StatesData EngineeringRemote5+ YOEDBTETL
Loop Financial

Analytics Engineer

Build and own core data models, ETL pipelines, and analytics infrastructure to enable data-driven decisions across the company and clients. Requires 2+ years building analytical products, strong SQL/Python, and modern data stack experience.

135k – 155kChicago, ILData EngineeringOn-site2+ YOESQLdbt
Turquoise Health

Data Science Engineer, Analytics

Build data pipelines, models, dashboards, and analyses to support product and business decision-making. Requires 2+ years of Python/SQL experience with data modeling, ETL tools, and AWS.

145k – 160kSan Diego, CAData EngineeringRemote2+ YOESQLdbt