A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records

Xiaoling Cai,Shoubin Dong,Jinlong Hu

doi:10.1186/s12911-019-0762-7

Abstract

BackgroundThe Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors’ personal style may result in multiple expressions of the same entity, and we lack a common Chinese medical dictionary to perform accurate entity extraction. Secondly, the electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. Therefore, the entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary.MethodsIn this study, we incorporate part-of-speech (POS) information into the deep learning model to improve the accuracy of Chinese entity boundary detection. In order to avoid the wrongly POS tagging of long entities, we proposed a method called reduced POS tagging that reserves the tags of general words but not of the seemingly medical entities. The model proposed in this paper, named SM-LSTM-CRF, consists of three layers: self-matching attention layer – calculating the relevance of each character to the entire sentence; LSTM (Long Short-Term Memory) layer – capturing the context feature of each character; CRF (Conditional Random Field) layer – labeling characters based on their features and transfer rules.ResultsThe experimental results at a Chinese EMRs dataset show that the F1 value of SM-LSTM-CRF is increased by 2.59% compared to that of the LSTM-CRF. After adding POS feature in the model, we get an improvement of about 7.74% at F1. The reduced POS tagging reduces the false tagging on long entities, thus increases the F1 value by 2.42% and achieves an F1 score of 80.07%.ConclusionsThe POS feature marked by the reduced POS tagging together with self-matching attention mechanism puts a stranglehold on entity boundaries and has a good performance in the recognition of clinical entities.

Highlights

The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs)
This paper focuses on the problem of entity boundary extraction with different entity lengths, and proposes a deep learning model that combines part-ofspeech information and self-matching attention mechanism for name entity recognition of Chinese electronic medical records
Dataset The data set includes 1000 admission records for patients, which are adopted from the Chinese EMR named entity recognition task in China Conference on Knowledge Graph and Semantic Computing in 2018

Summary

Introduction

The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). The electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. The entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary. This paper focuses on the problem of entity boundary extraction with different entity lengths, and proposes a deep learning model that combines part-ofspeech information and self-matching attention mechanism for name entity recognition of Chinese electronic medical records

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Apr 1, 2019
Citations: 31	License type: open-access

R Discovery Prime

R Discovery Prime

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text
Ying Xiong ... Qingcai Chen
BMC Medical Informatics and Decision Making | VOL. 19
Ying Xiong, et. al.Ying Xiong ... Qingcai Chen
01 Apr 2019
BMC Medical Informatics and Decision Making | VOL. 19

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
An Fang ... Ming Feng
BMC Medical Informatics and Decision Making | VOL. 22
An Fang, et. al.An Fang ... Ming Feng
23 Mar 2022
BMC Medical Informatics and Decision Making | VOL. 22

Chinese medical entity recognition based on the dual-branch TENER model.
Hui Peng ... Xiaohui Qin
BMC Medical Informatics and Decision Making | VOL. 23
Hui Peng, et. al.Hui Peng ... Xiaohui Qin
24 Jul 2023
BMC Medical Informatics and Decision Making | VOL. 23

A Deep Learning Approach to Malayalam Parts of Speech Tagging
M K Junaida ... Anto P Babu
-
M K Junaida, et. al.M K Junaida ... Anto P Babu
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making