Abstract

TCM electronic medical records contain a plethora of valuable clinical information. In this paper, the named entity recognition (NER) of TCM electronic medical records of orthopedics is studied based on the ALBERT-BiLSTM-CRF model. Based on the ALBERT pre-trained language model, the labeled data are encoded to complete the word embedding and obtain dynamic word vectors. Then, BiLSTM is applied to fully understand the context semantics. Finally, the concatenated vector is input to the CRF layer and decoded by the Viterbi algorithm to study NER. The model constructed in this paper recognizes four types of entities, i.e., clinical manifestations, body parts, TCM syndrome types, and TCM disease names, with F1 values of 98.31%,95.00%, 97.48%, and 99.96%, respectively. The precision rate, recall rate, and F1 value of overall entity recognition are 98.39%, 97.16% and 97.76%, respectively. As compared with the BERT-RNN-CRF, BERT-GRU-CRF, and BERT-BiLSTM-CRF models, the proposed model shows improvements in terms of precision rate, recall rate, and F1 value. In addition, the proposed model is capable of efficiently accomplishing entity recognition tasks, thus making it more suitable for the NER task of TCM electronic medical records.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call