MMBERT: a unified framework for biomedical named entity recognition.

Lei Fu,Zuquan Weng,Jiheng Zhang,Yiqing Cao,Haihe Xie

doi:10.1007/s11517-023-02934-8

Lei Fu, Zuquan Weng + Show 3 more

https://doi.org/10.1007/s11517-023-02934-8

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Named entity recognition (NER) is an important task in natural language processing (NLP). In recent years, NER has attracted much attention in the biomedical field. However, due to the lack of biomedical named entity identification datasets, the complexity and rarity of biomedical named entities and so on, biomedical NER is more difficult than general domain NER. So in this paper, we propose a framework (MMBERT) based on Transformer to solve the problems above. To address the scarcity of biomedical named entity recognition datasets, we introduce ERNIE-Health, a new Chinese language representation model pre-trained on large-scale biomedical text corpora. Because of the complexity and rarity of biomedical named entities, we use the Bert and CW-LSTM structures to get the joint feature vector of word pairs relations. In addition, we design multi-granularity 2D convolution to refine the relationship and representation between word pairs. Finally, we design a convolutional neural network (CNN) structure and a co-predictor to improve the model's generalization capability and prediction accuracy. We have conducted extensive experiments on three benchmark datasets, and the experimental results show that our model achieves the best results compared with several baseline models in the experiment.

Full Text