Skip to content

Data Science Fellow - AI/NLP

90k – 100kUnited StatesRemote
Summary

Postdoctoral researcher developing AI/NLP and knowledge engineering methods to transform biomedical literature into structured, evidence-grounded knowledge for organoid protocol standardization. Requires PhD and strong Python/NLP research experience.

About the role

Responsibilities

  • Design and implement AI/NLP methods for biomedical literature mining and structured protocol knowledge extraction.
  • Develop benchmark datasets, annotation guidelines, and evaluation pipelines for scientific information extraction.
  • Build and evaluate RAG, in-context learning, fine-tuning, graph matching, entity normalization, and KG query workflows.
  • Analyze extraction errors, model behavior, retrieval failures, grounding quality, and biological ambiguity.
  • Collaborate with software engineers to integrate research methods into usable tools and reproducible pipelines.
  • Collaborate with organoid biologists and domain experts to translate biological protocol knowledge into computable representations.
  • Prepare manuscripts, conference abstracts, technical reports, design documents, and open-source research artifacts.
  • Help define research milestones, evaluation criteria, and publication strategy for protocol intelligence work.

Requirements

  • PhD in computer science, computational biology, bioinformatics, biomedical informatics, NLP, machine learning, data science, or a related field.
  • Strong Python programming skills.
  • Demonstrated research experience with NLP, information extraction, LLMs, RAG, transformers, structured prediction, or scientific text mining.
  • Ability to design controlled computational experiments, create benchmark datasets, and analyze results rigorously.
  • Familiarity with biological, biomedical, or scientific data.
  • Strong written communication skills and interest in publishing methods-oriented research.
  • Comfort working with complex, evolving research codebases and interdisciplinary teams.

Preferred Qualifications

  • Experience with scientific document processing, PDF parsing, biomedical literature mining, or methods-section extraction.
  • Experience with knowledge graphs, ontologies, graph databases, graph algorithms, or semantic data modeling.
  • Hands-on experience with fine-tuning LLMs, LoRA/QLoRA, Hugging Face, PyTorch, or API-based model evaluation.
  • Hands-on experience with prompt engineering, structured JSON extraction, schema validation, tool use, or agentic LLM workflows.
  • Hands-on experience with RAG systems, vector search, graph-augmented retrieval, or natural-language query over structured data.
  • Exposure to bioinformatics concepts (e.g., sequence alignment, clustering, or phylogenetic analysis).
  • Background in stem cell biology, organoids, developmental biology, wet-lab protocols, or biological assays.
Skills
PythonNLPLLMsRAGTransformersInformation ExtractionKnowledge GraphsPyTorchHugging FaceFine-tuning
Similar roles at this salary range
All Data Science jobs →
Axle

Bioinformatics Scientist

Provide bioinformatics and computational support for multi-omics and wearable data studies at NIH/NIA. Requires a Master's in Biology and strong programming and statistical analysis skills.

80k – 90kBaltimore, MDData ScienceOn-siteRPython
Pinterest

Data Scientist

As a Data Scientist, you will shape the future of Pinterest's products by applying quantitative modeling, experimentation, and algorithms to complex engineering challenges. You will collaborate with cross-functional partners to bring scientific rigor to product development and deliver insights that influence product teams.

101k – 209kSan Francisco, CA +1Data ScienceRemote4+ YOERSQL
Axle

Bioinformatics/Data Scientist

Conduct multi-omics analyses (scRNA-seq, bulk RNA-seq, proteomics, metabolomics) on organoid systems to characterize fidelity and identify biomarkers. Develop computational pipelines, integrate public datasets, and collaborate with experimental teams. Requires a PhD in bioinformatics or related field with strong R/Python skills.

105k – 120kFrederick, MDData ScienceOn-siteEntry levelRPython
Layer Health

Contract Data Scientist

Develops and iterates ML/LLM models for clinical use cases in healthcare, conducts error analysis, and ships production-ready solutions. Requires strong Python skills, data science fundamentals, and familiarity with modern LLM techniques.

83k – 146kBoston, MA +1Data ScienceHybridLLMPython
Astera

Scientist - Ensemble Structural Informatics

Leads development of standards and validation frameworks for dynamic structural biology data, focusing on ensemble models from X-ray crystallography and cryo-EM. Collaborates with engineers to build deposition, search, and retrieval infrastructure for the diffUSE Project. Requires PhD in structural biology or related field.

100k – 180kEmeryville, CAData ScienceOn-sitePDBEMDB