Augmenting biomedical named entity recognition with general-domain resources

Yu Yin,Hyunjae Kim,Xiao Xiao,Chih Hsuan Wei,Jaewoo Kang,Zhiyong Lu,Hua Xu,Meng Fang,Qingyu Chen

doi:10.1016/j.jbi.2024.104731

Yu Yin, Hyunjae Kim + Show 7 more

Open Access

https://doi.org/10.1016/j.jbi.2024.104731

Copy DOI

Export

Save

Cite

Journal: Journal of Biomedical Informatics	Publication Date: Oct 4, 2024
License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

ObjectiveTraining a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets. MethodsWe proposed GERBERA, a simple-yet-effective method that utilized general-domain NER datasets for training. We performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset. ResultsWe systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with additional BioNER datasets. Specifically, our models consistently outperformed the baseline models in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight entities. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset. ConclusionThis study introduces a new training method that leverages cost-effective general-domain NER datasets to augment BioNER models. This approach significantly improves BioNER model performance, making it a valuable asset for scenarios with scarce or costly biomedical datasets. We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Augmenting biomedical named entity recognition with general-domain resources

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

Improving deep learning method for biomedical named entity recognition by using entity definition information
Ying Xiong ... Yi Zhou
BMC Bioinformatics | VOL. 22
Ying Xiong, et. al.Ying Xiong ... Yi Zhou
01 Dec 2021
BMC Bioinformatics | VOL. 22

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition
Usman Naseem ... Matloob Khushi
-
Usman Naseem, et. al.Usman Naseem ... Matloob Khushi
18 Jul 2021
18 Jul 2021

Language model based on deep learning network for biomedical named entity recognition
Guan Hou ... Yuhao Jian
Methods | VOL. 226
Guan Hou, et. al.Guan Hou ... Yuhao Jian
17 Apr 2024
Methods | VOL. 226

Cross-type biomedical named entity recognition with deep multi-task learning.
Xuan Wang ... Curtis Langlotz
Bioinformatics | VOL. 35
Xuan Wang, et. al.Xuan Wang ... Curtis Langlotz
11 Oct 2018
Bioinformatics | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Augmenting biomedical named entity recognition with general-domain resources

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics