Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian

Alfonsus Haryo Sangaji,Supeno Mardi Susiki Nugroho,Adhi Dharma Wibawa,Yuri Pamungkas

doi:10.22219/kinetik.v7i1.1377

Alfonsus Haryo Sangaji, Supeno Mardi Susiki Nugroho + Show 2 more

Open Access

https://doi.org/10.22219/kinetik.v7i1.1377

Copy DOI

Journal: Kinetik	Publication Date: Feb 28, 2022
License type: CC BY-NC-SA 4.0

Affiliation: Sepuluh Nopember Institute of Technology

Abstract

Recently, electronic medical record (EMR) has become the source of many insights for clinicians and hospital management. EMR stores much important information and new knowledge regarding many aspects for hospital and clinician competitive advantage. It is valuable not only for mining data patterns saved in it regarding the patient symptoms, medication, and treatment, but also it is the box deposit of many new strategies and future trends in the medical world. However, EMR remains a challenge for many clinicians because of its unstructured form. Information extraction helps in finding valuable information in unstructured data. In this paper, information on disease symptoms in the form of text data is the focus of this study. Only the highest prevalence rate of diseases in Indonesia, such as tuberculosis, malignant neoplasm, diabetes mellitus, hypertensive, and renal failure, are analyzed. Pre-processing techniques such as data cleansing and correction play a significant role in obtaining the features. Since the amount of data is imbalanced, SMOTE technique is implemented to overcome this condition. The process of extracting symptoms from EMR data uses a rule-based algorithm. Two algorithms were implemented to classify the disease based on the features, namely SVM and Random Forest. The result showed that the rule-based symptoms extraction works well in extracting valuable information from the unstructured EMR. The classification performance on all algorithms with accuracy in SVM 78% and RF 89%.

Full Text