Improving dictionary-based named entity recognition with deep learning.

Katerina Nastou,Mikaela Koutrouli,Sampo Pyysalo,Lars Juhl Jensen

doi:10.1093/bioinformatics/btae402

Abstract

Dictionary-based named entity recognition (NER) allows terms to be detected in a corpus and normalized to biomedical databases and ontologies. However, adaptation to different entity types requires new high-quality dictionaries and associated lists of blocked names for each type. The latter are so far created by identifying cases that cause many false positives through manual inspection of individual names, a process that scales poorly. In this work, we aim to improve block list s by automatically identifying names to block, based on the context in which they appear. By comparing results of three well-established biomedical NER methods, we generated a dataset of over 12.5 million text spans where the methods agree on the boundaries and type of entity tagged. These were used to generate positive and negative examples of contexts for four entity types (genes, diseases, species, and chemicals), which were used to train a Transformer-based model (BioBERT) to perform entity type classification. Application of the best model (F1-score = 96.7%) allowed us to generate a list of problematic names that should be blocked. Introducing this into our system doubled the size of the previous list of corpus-wide blocked names. In addition, we generated a document-specific list that allows ambiguous names to be blocked in specific documents. These changes boosted text mining precision by ∼5.5% on average, and over 8.5% for chemical and 7.5% for gene names, positively affecting several biological databases utilizing this NER system, like the STRING database, with only a minor drop in recall (0.6%). All resources are available through Zenodo https://doi.org/10.5281/zenodo.11243139 and GitHub https://doi.org/10.5281/zenodo.10289360.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving dictionary-based named entity recognition with deep learning.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Journal: Bioinformatics (Oxford, England)	Publication Date: Sep 1, 2024
License type: cc-by

Similar Papers

Improving deep learning method for biomedical named entity recognition by using entity definition information
Ying Xiong ... Yi Zhou
BMC Bioinformatics | VOL. 22
Ying Xiong, et. al.Ying Xiong ... Yi Zhou
01 Dec 2021
BMC Bioinformatics | VOL. 22

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.
Robert Leaman ... Zhiyong Lu
Bioinformatics | VOL. 32
Robert Leaman, et. al.Robert Leaman ... Zhiyong Lu
09 Jun 2016
Bioinformatics | VOL. 32

An End-To-End NER Model with Explicit Boundary and Type Information
Ying Feng ... Zhe Chen
Journal of Physics: Conference Series | VOL. 2337
Ying Feng, et. al.Ying Feng ... Zhe Chen
01 Sep 2022
Journal of Physics: Conference Series | VOL. 2337

Terminologies augmented recurrent neural network model for clinical named entity recognition.
Ivan Lerner ... Xavier Tannier
Journal of Biomedical Informatics | VOL. 102
Ivan Lerner, et. al.Ivan Lerner ... Xavier Tannier
16 Dec 2019
Journal of Biomedical Informatics | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving dictionary-based named entity recognition with deep learning.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)