The Algorithms for Word Segmentation and Named Entity Recognition of Chinese Medical Records

Yuan-Nong Ye,Meng-Ya Huang,Zhu Zeng,Liu-Feng Zheng,Tao Liu

doi:10.1007/978-3-030-78615-1_35

Abstract

AbstractA complete inpatient electronic medical record contains a lot of information. In recent years, numerous researchers have carried out research on word segmentation of medical texts. Since the medical record text is written by free text, structuring the medical record text is an important part of the medical record intelligent analysis when these medical named entities are recognized. At present, named entity recognition methods are mainly divided into lexicographical and rule-based methods and machine learning-based methods. The machine learn-based approach takes the named entity recognition task as the annotation problem of sequence data, mainly considering the context information. The features commonly used in feature construction are contextual feature and dictionary features. BERT+LSTM+CRF were used to train the named entity recognition model. Open source CRF++ was adopted as the tool we relied on. We trained the LSTM+F model using the results of the original word segmentation and the information in the context as features. We carried out a 5-fold cross validation. The results showed that the overall F-1 score (MICRO-F) of named entity recognition reached 0.92, which confirmed that the model could accurately complete the task of medical named entity recognition.KeywordsWord segmentationNamed entity recognitionNatural language processing

Full Text