Senior Software Engineer, Data
Builds and maintains scalable data pipelines for Semantic Scholar corpus, improves data quality with ML techniques like entity resolution and classification, and designs APIs for AI research agents. Requires 8+ years experience, strong Python/SQL skills, and ML familiarity.
Your Next Challenge
- Improve the coverage and quality of the Semantic Scholar corpus across academic papers, patents, and new domain-specific datasets
- Build and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enrichment
- Develop and deploy ML models for entity disambiguation, author linking, and topic classification
- Design and extend APIs that expose structured scholarly data to academic researchers and AI agent workflows
- Contribute to dashboards and tools for evaluating data quality and model precision
- Collaborate across engineering and research teams to ensure maintainability, test coverage, and robust deployment
What You’ll Need
Required:
- Bachelor's degree and 8+ years of technical experience; relevant experience may substitute for education.
- Strong Python engineering skills, especially for building and maintaining data pipelines
- Experience with SQL and schema design in production settings (PostgreSQL preferred)
- Familiarity with ML workflows (training classifiers, tuning models, deploying for inference), particularly for large-scale or ambiguous structured datasets
- Comfortable working with structured data formats (XML/JSON/Parquet) and writing ETL code
- Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (AWS, S3, Docker)
- Strong communicator and a strong sense of ownership for results
Preferred:
- Experience with author disambiguation, entity resolution, or record linkage problems
- Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale
- Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO)
- Familiarity with building APIs or data services consumed by automated or agent-based workflows
Compensation: Base salary range $126,000 - $189,000, plus generous bonus plans.
Lead Analytics Engineer
Lead Analytics Engineer responsible for shaping data architecture, mentoring the team, and delivering end-to-end data solutions that power decisions across Product, Marketing, Operations, and Finance. Requires 7+ years experience, expert SQL, advanced dbt, and proven data architecture impact.
Analytics Engineer
Build and own core data models, ETL pipelines, and analytics infrastructure to enable data-driven decisions across the company and clients. Requires 2+ years building analytical products, strong SQL/Python, and modern data stack experience.