Abstract

BackgroundNamed entity recognition (NER) on Chinese electronic medical/healthcare records has attracted significantly attentions as it can be applied to building applications to understand these records. Most previous methods have been purely data-driven, requiring high-quality and large-scale labeled medical data. However, labeled data is expensive to obtain, and these data-driven methods are difficult to handle rare and unseen entities.MethodsTo tackle these problems, this study presents a novel multi-task deep neural network model for Chinese NER in the medical domain. We incorporate dictionary features into neural networks, and a general secondary named entity segmentation is used as auxiliary task to improve the performance of the primary task of named entity recognition.ResultsIn order to evaluate the proposed method, we compare it with other currently popular methods, on three benchmark datasets. Two of the datasets are publicly available, and the other one is constructed by us. Experimental results show that the proposed model achieves 91.07% average f-measure on the two public datasets and 87.05% f-measure on private dataset.ConclusionsThe comparison results of different models demonstrated the effectiveness of our model. The proposed model outperformed traditional statistical models.

Highlights

  • With rapid development of Electronic Medical Records (EMRs) systems, there has been an increasing interest in applying text mining and information extraction to the EMRs

  • The main contributions of this article are as follows: (1) We present a multi-task learning framework which jointly trains a model to perform entity segmentation with cross-entropy loss and entity recognition task with Conditional Random Fields (CRF)

  • The Chinese clinical Named entity recognition (NER) task is usually known as a sequence labelling task, while Named Entity Segmentation (NES) task is considered as binary classification task of whether a token is entity or not

Read more

Summary

Introduction

With rapid development of Electronic Medical Records (EMRs) systems, there has been an increasing interest in applying text mining and information extraction to the EMRs. Among the medical texts mining tasks, NER is a fundamental task which locates the mentions of named entities and classifies them (e.g. symptoms, tests, drugs, operations and diseases, etc.) in unstructured medical/healthcare. The deep models usually require a large amount of labeled data for training, while manual annotation is time-consuming. In order to alleviate the dependence of large annotation data, some researchers proposed to integrate prior knowledge into the models [9]. Named entity recognition (NER) on Chinese electronic medical/healthcare records has attracted significantly attentions as it can be applied to building applications to understand these records. Most previous methods have been purely data-driven, requiring high-quality and large-scale labeled medical data. Labeled data is expensive to obtain, and these data-driven methods are difficult to handle rare and unseen entities

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.