Abstract

It is highly significant from a research standpoint and a valuable practice to identify diseases, symptoms, drugs, examinations, and other medical entities in medical text data to support knowledge maps, question and answer systems, and other downstream tasks that can provide the public with knowledgeable answers. However, when contrasted with other languages like English, Chinese words lack a distinct dividing line, and medical entities have problems such as long length and multiple entity types nesting. Therefore, to address these issues, this study suggests a medical named entity recognition (NER) approach that combines part-of-speech and stroke features. First, the text is fed into the BERT pre-training model to get the semantic representation of the text, while the part-of-speech feature vector is obtained using the part-of-speech dictionary, and the stroke feature of the text is extracted through a convolution neural network (CNN). The word vector is then joined with the part-of-speech and stroke feature vectors, respectively, and input into the BiLSTM and CRF layer for training. Additionally, to balance the disparity in data volume across several types of entities, the class-weighted loss function is included in the loss function. According to the experimental findings, our model’s F1 score on the CCKS2019 dataset reaches 78.65%, and the recognition performance exceeds many existing algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call