MLNER: Exploiting Multi-source Lexicon Information Fusion for Named Entity Recognition in Chinese Medical Text

Yinlong Xiao,Jianqiang Li,Jieqing Chen,Qing Zhao,Zhenning Cheng

doi:10.1109/compsac51774.2021.00147

Abstract

The integration of lexicon information into character-based models is a hot topic in Chinese Named Entity Recognition(NER) research. Most methods only utilize information from a single lexicon which is usually a general lexicon. However, In the Chinese medical text scenario, due to the large amount of medical terminology, a single lexicon, especially a general lexicon, offers little performance improvement to the Chinese NER. In this paper, we propose a Multi-source Lexicon Information Fusion method for Named Entity Recognition in Chinese Medical Text(MLNER) which can utilize information from both general and medical lexicons. Considering the small medical annotated corpus, we combine the model with the pre-trained model to improve the performance of the model on small datasets by exploiting the rich representation capability of the pre-trained model. Experiments show that our method can effectively improve the performance of NER in Chinese medical text. Our model is also applicable to Chinese NER tasks in other domain specific fields, with good scalability and application value.

Full Text