Abstract

AbstractHealth datasets often exhibit class imbalance, with healthy individuals being the majority class and patients being the minority class. As in many applications, machine learning methods are frequently used in the prediction and detection of diseases. The class imbalance presents a significant challenge for machine learning classifiers, particularly in their ability to accurately and effectively classify data from minority and majority classes. Therefore, data preprocessing is crucial before classifying the imbalanced data. In this study, we present GASMOTEPSO_ENN method that combines the synthetic minority oversampling technique (SMOTE) and edited nearest neighbour (ENN) algorithms using genetic algorithm (GA) and particle swarm optimization (PSO) heuristics as a preprocessing method to classify the imbalanced health datasets. In the experiments, chronic kidney disease (CKD), cerebral stroke prediction (CSP), and PIMA Indian diabetes (PID) datasets were utilized to assess the performance of the proposed method with metrics derived from the confusion matrix. The GASMOTEPSO_ENN method can classify the various diseases into different two classes of patients and healthy individuals with acceptable Matthews correlation coefficient (MCC) metric using the machine learning algorithms (Logistic regression (LR) 1.00 for CKD dataset, extreme gradient boosting (XGBoost) 0.94 for CSP dataset, and support vector machine (SVM) 0.87 for PID dataset). Moreover, the proposed method also performed well with other metrics in all datasets, and the analysis of the model results in relation to existing literature reveals that the proposed model demonstrably produces superior results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call