Abstract

The data generated by the structured electronic medical records is helpful for mining and extracting medical data, and it is an effective way to make effective use of valuable data resources. However, the hospitals have accumulated a large number of unstructured data in electronic medical records, which cannot be effectively searched, resulting in serious waste of resources. In this paper, we study the problem of extracting attribute values from the unstructured text in electronic medical records. By observing intestinal cancer diagnostic texts, our attributes have two categories - discriminative attributes and extractive attributes, which use the text classification and the sequence labeling to tackle attribute values extraction problems. For discriminative attributes, we firstly divide the text into sentences/segments as instances. Secondly, we fine-tune the pre-trained word embedding to capture domain-specific semantics/knowledge. Thirdly, we also use an attention mechanism to select the most important instance for different attribute extractors. Finally, multi-tasking learning is used to share useful information to get better experimental results. For extractive attributes, we propose a novel model to get attribute values, including the BiLSTM layer, the CNN layer and the CRF layer. In particular, we use BiLSTM and CNN to learn text features and CRF as the last layer of the model. Experiments have shown that our method is superior to several competitive baseline methods.

Highlights

  • With the continuous development of science and technology, the research results on data have been gradually applied to various domains

  • We focus on extract both the discriminative and extractive attributes, which is more practice in a real-world applications

  • In this paper, we use the pre-trained word embedding to better initialize the parameters of our models, we fine-tune them by using our domain corpus to capture domain-specific semantics/knowledge

Read more

Summary

Introduction

With the continuous development of science and technology, the research results on data have been gradually applied to various domains. The data of the Electronic Medical Records (EMR) system has attracted the attention of researchers and has become the main issue of research. The EMR data contains a large number of patients’ basic information, condition diagnosis reports and medical knowledge, which are valuable wealth in the medical domain. Only structured data can serve medical research. The main work of this paper are to transform the unstructured intestinal cancer diagnostic text into structured

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.