Chinese Electronic Medical Record Named Entity Recognition based on BERT-WWM-IDCNN-CRF

Yingjie Cao,Azragul Yusup

doi:10.1109/dsa56465.2022.00084

Abstract

Named entity recognition is a basic work of natural language processing. Aiming at the disadvantages of traditional BERT pre-training, such as random MASK of single word, the large number of unregistered words and the phenomenon of multiple meanings of one word in electronic medical record, the BERT model of Whole Word Masking and the network model of dilated convolutional network (IDCNN) were designed. In the word embedding layer, the BERT pre-training model of whole word covering is introduced to enhance the semantic representation of word vector. Then, IDCNN is used to extract sentence features to the maximum extent. Finally, conditional random field (CRF) is used to output tag sequences with the maximum probability. The experimental results show that this model is more suitable for Chinese electronic medical record entity recognition, and its recognition accuracy, recall rate and F1 value are all above 91%, which are better than the traditional method.

Full Text