Deep neural network-based recognition of entities in Chinese online medical inquiry texts

Xin Liu,Yanju Zhou,Zongrun Wang

doi:10.1016/j.future.2020.08.022

Abstract

It is quite challenging to correctly identify entities such as disease names, symptoms, and drugs from Chinese online medical inquiry texts. On the one hand, traditional natural language related methods cannot be directly applied to the field of online medical inquiry. Although supervised or unsupervised learning algorithms provide an entity recognition strategy for medical inquiries on online platforms, these methods either rely extensively on specific knowledge sources or artificially-designed features, or have a strong self-adaptivity, can barely obtain fairly good entity recognition outcomes, and consequently have weak generalization. On the other hand, Chinese online medical inquiry data is characterized by a large volume and excessively rich unstructured data, medical entities are indeed distributed in a sparse way, and the Chinese characters and words are quite complicated. It is difficult to establish a robust model for the recognition of entities in Chinese online medical inquiry texts. Therefore, establishing a new deep neural network (OMINer-CE) is the attempt of this paper. First, in order to form a proper feature strategy, the basic features of other tasks are introduced, while extended features, such as continuous bag of word cluster (CBOWC) feature, are constructed. Second, the feature vectors of Chinese character and word fusion are introduced to reserve all the Chinese character information of original sequences while introducing Chinese word-based semantic information. Third, a context encoding layer and a label decoding layer are introduced; On the basis of the recurrent neural network model BiLSTM, a convolutional neural network (CNN) is added to learn more key local features of Chinese word context, and the attention mechanism is used to obtain the long-distance dependency of the Chinese words to form a OMINer model. Finally, the basic and extended feature vectors are integrated into the OMINer model, and the grammatical and semantic information contained in the labeled texts is obtained from different perspectives. Therefore, it considers the feature strategies of Chinese online medical platforms, and realizes a strong self-adaptivity by utilizing the deep neural network. It is showed by the experiments that preferably good performances can be realized by combining the CE basic and extended feature vectors in the BiLSTM of the OMINer model, suggesting that the OMINer-CE model improves the performance of recognizing entities in Chinese online medical inquiry texts.

Full Text