OTAR3088 NLP-model collection

Work Package 1 - Knowledge Extraction (NLP)

Background

Within this working group of the greater OTAR3088, 'Automating Knowledge Management' project, we aim to modernise and extend the current named entity recognition workflows of EuropePMC / Open Targets to cover an array of entity types of entities relevant to drug discovery (such as variants, biomarkers, tissues/cell types, adverse events, and assay conditions). These new entities will provide higher confidence in the relevance of a target-disease association.

Since NLP models are constantly updated and fine-tuned, we have created a modular, flexible framework that facilitates the creation of new NLP models.

OTAR3088 HuggingFace

This organisation space details all of the data development and model generation of the project. Data is sectioned by the greater entity-type being studied by the group at a given time, sources of data are described in the data cards. Output models are also shared here.

Learn more about our project, resources and others: