Abstract

The typical pretrained model’s feature extraction capabilities are insufficient for medical named entity identification, and it is challenging to express word polysemy, resulting in a low recognition accuracy for electronic medical records. In order to solve this problem, this paper proposes a new model that combines the BERT pretraining model and the BilSTM-CRF model. First, word embedding with semantic information is obtained by pretraining the corpus input to the BERT model. Then, the BiLSTM module is utilized to extract further features from the encoded outputs of BERT in order to account for context information and improve the accuracy of semantic coding. Then, CRF is used to modify the results of BiLSTM to screen out the annotation sequence with the largest score. Finally, extensive experimental results show that the performance of the proposed model is effectively improved compared with other models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call