Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining.

Lejun Gong,Zhifei Zhang,Shiqi Chen,Jiafeng Yao

doi:10.1155/2020/8829219

Abstract

Background Clinical named entity recognition is the basic task of mining electronic medical records text, which are with some challenges containing the language features of Chinese electronic medical records text with many compound entities, serious missing sentence components, and unclear entity boundary. Moreover, the corpus of Chinese electronic medical records is difficult to obtain. Methods Aiming at these characteristics of Chinese electronic medical records, this study proposed a Chinese clinical entity recognition model based on deep learning pretraining. The model used word embedding from domain corpus and fine-tuning of entity recognition model pretrained by relevant corpus. Then BiLSTM and Transformer are, respectively, used as feature extractors to identify four types of clinical entities including diseases, symptoms, drugs, and operations from the text of Chinese electronic medical records. Results 75.06% Macro-P, 76.40% Macro-R, and 75.72% Macro-F1 aiming at test dataset could be achieved. These experiments show that the Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition effect. Conclusions These experiments show that the proposed Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition performance.

Highlights

Clinical named entity recognition is the basic task of mining electronic medical records text, which are with some challenges containing the language features of Chinese electronic medical records text with many compound entities, serious missing sentence components, and unclear entity boundary
Electronic medical records (EMR) record various symptoms and examination measures taken by patients from before admission to hospitalization and that medical personnel provides based on examination results such as disease diagnosis and treatment methods as medical resources constructed by professionals
In view of the above problems, this study proposes a named entity recognition method for Chinese EMR based on pretraining. e method is based on word embedding pretraining and fine-tuning of entity recognition model pretrained by relevant corpus

Summary

Background

Medical informatization has produced a large number of electronic medical records. e electronic medical record completely preserves the detailed information of the patients’ diagnosis and treatment process and has the advantages of regular writing format, convenient retrieval, and storage, and it can better help telemedicine further. Ere are relevant studies on Chinese clinical entity recognition using the deep learning method [16,17,18], whose model is basically the sequence model RNN and its variants. Ere were two specific practices for implementing the deep learning pretraining mode: firstly, the input is initialized by the same field corpus pretraining EMR embedding and, secondly, the entity recognition model pretrained by relevant corpus is fine-tuning. It is difficult to annotate the corpus of Chinese EMRs. In order to make full use of the resources of previous studies, it is used to fine-tune our recognition tasks based on a clinical entity recognition model (https://github.com/ baiyyang/medical-entity-recognition) trained by medical data of CCKS2017 tasks. A variety of deep learning methods have been widely applied in named entity recognition tasks, usually using RNN model and its variants. Note that there are multiple sets of Q/K/V weight matrices in the mechanism, each of which is randomly initialized, and after training, each set is used to embed the input word or the vector from the previous encoder/decoder into a different representation subspace

Results and Discussion

Conclusions