CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes.

Katerina Nastou,Lars Juhl Jensen,Sampo Pyysalo,Mikaela Koutrouli

doi:10.1093/bioadv/vbae116

Abstract

Despite significant progress in biomedical information extraction, there is a lack of resources for Named Entity Recognition (NER) and Named Entity Normalization (NEN) of protein-containing complexes. Current resources inadequately address the recognition of protein-containing complex names across different organisms, underscoring the crucial need for a dedicated corpus. We introduce the Complex Named Entity Corpus (CoNECo), an annotated corpus for NER and NEN of complexes. CoNECo comprises 1621 documents with 2052 entities, 1976 of which are normalized to Gene Ontology. We divided the corpus into training, development, and test sets and trained both a transformer-based and dictionary-based tagger on them. Evaluation on the test set demonstrated robust performance, with F-scores of 73.7% and 61.2%, respectively. Subsequently, we applied the best taggers for comprehensive tagging of the entire openly accessible biomedical literature. All resources, including the annotated corpus, training data, and code, are available to the community through Zenodo https://zenodo.org/records/11263147 and GitHub https://zenodo.org/records/10693653.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics advances

Lead the way for us

Journal: Bioinformatics advances	Publication Date: Jan 5, 2024
License type: CC BY 4.0

Similar Papers

Towards a Novel Weakly Supervised Joint Approach of Named Entity Recognition and Normalization for Noisy Text
Assia Mezhar ... Mohammed Ramdani
SSRN Electronic Journal | VOL. -
Assia Mezhar, et. al.Assia Mezhar ... Mohammed Ramdani
01 Jan 2018
SSRN Electronic Journal | VOL. -

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.
Robert Leaman ... Zhiyong Lu
Bioinformatics | VOL. 32
Robert Leaman, et. al.Robert Leaman ... Zhiyong Lu
09 Jun 2016
Bioinformatics | VOL. 32

Impact of Translation on Biomedical Information Extraction: Experiment on Real-Life Clinical Notes.
Christel Gérardin ... Xavier Tannier
JMIR medical informatics | VOL. 12
Christel Gérardin, et. al.Christel Gérardin ... Xavier Tannier
04 Apr 2024
JMIR medical informatics | VOL. 12

Graph-Based Jointly Modeling Entity Detection and Linking in Domain-Specific Area
Jiangtao Zhang ... Juanzi Li
-
Jiangtao Zhang, et. al.Jiangtao Zhang ... Juanzi Li
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics advances