Applying data mining techniques to classify patients with suspected hepatitis C virus infection

Reza Safdari,Amir Deghatipour,Marsa Gholamzadeh,Keivan Maghooli

doi:10.1016/j.imed.2021.12.003

Reza Safdari, Amir Deghatipour + Show 2 more

Open Access

https://doi.org/10.1016/j.imed.2021.12.003

Copy DOI

Abstract

BackgroundHepatitis C virus (HCV) has a high prevalence worldwide, and the progression of the disease can cause irreversible damage to severe liver damage or even death. Therefore, developing prediction models using machine learning techniques is beneficial. This study was conducted to classify suspected patients with HCV infection using different classification models. MethodsThe study was conducted using a dataset derived from the University of California, Irvine (UCI) Machine Learning Repository. Since the HCV dataset was imbalanced, the synthetic minority oversampling technique (SMOTE) was applied to balance the dataset. After cleaning the dataset, it was divided into training and test data for developing six classification models. These six algorithms included the support vector machine (SVM), Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), and K-nearest neighbors (KNN) algorithm. The Python programming language was used to develop the classifiers. Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models. ResultsAfter the evaluation of the models using different metrics, the RF classifier had the best performance among the six methods. The accuracy of the RF classifier was 97.29%. Accordingly, the area under the curve (AUC) for LR, KNN, DT, SVM, Gaussian NB, and RF models were 0.921, 0.963, 0.953, 0.972, 0.896, and 0.998, respectively, RF showing the best predictive performance. ConclusionVarious machine learning techniques for classifying healthy and unhealthy patients were used in this study. Additionally, the developed models might identify the stage of HCV based on trained data.

Full Text