Abstract

Currently, medical institutes generally use EMR to record patient's condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work.

Highlights

  • With the development of information technology and HIS, EMR has been popularized

  • Dimension reduction method generally controls the size of the dataset by reducing the number of random variables or attributes

  • The named-entity recognition (NER) performance is reported in terms of F-score, while the assignment of attribute values for each discovered named entity in terms of accuracy because each attribute is assigned a label

Read more

Summary

Introduction

With the development of information technology and HIS (hospital information system), EMR has been popularized. EMR (electronic medical record) or EHR (electronic health record), which medical staff uses to record texts, symbols, charts, graphics, data, and other digital information generated by HIS, refers to medical records, which could be stored, managed, transmitted, and reproduced efficiently. With the tremendous growth of the adoption of EMR, various sources of clinical information (including demographics, diagnostic history, medications, laboratory test results, and vital signs) are becoming available, which has established EMR as a treasure trove for large-scale analysis of health data. Data in EMR can be divided into three kinds: structured data, semistructured data, and unstructured data [1]. Unstructured text is one kind of narrative data, including clinical notes, surgical records, discharge records, radiology reports, and pathology reports. Unstructured texts store a lot of valuable medical information but lack common structural frameworks, and there are many errors, such as improper grammatical use, spelling errors, local dialects, Journal of Healthcare Engineering

Evaluation
General Procedure of EMR Data Processing
Classical Data Preprocessing on EMR
Information Extraction of EMR Based on Text Mining
Application of EMR Mining Technology
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.