An Improved Ensemble Machine Learning Approach for Diabetes Diagnosis

Mohanad Mohammed Rashid,Rana Riyadh Saeed,Maher Talal Alasaady,Omar Mahmood Yaseen

doi:10.47836/pjst.32.3.19

Mohanad Mohammed Rashid, Rana Riyadh Saeed + Show 2 more

Open Access

https://doi.org/10.47836/pjst.32.3.19

Copy DOI

Journal: Pertanika Journal of Science and Technology	Publication Date: Apr 4, 2024
License type: cc-by-nc-nd

Abstract

Diabetes is recognized as one of the most detrimental diseases worldwide, characterized by elevated levels of blood glucose stemming from either insulin deficiency or decreased insulin efficacy. Early diagnosis of diabetes enables patients to initiate treatment promptly, thereby minimizing or eliminating the risk of severe complications. Although years of research in computational diagnosis have demonstrated that machine learning offers a robust methodology for predicting diabetes, existing models leave considerable room for improvement in terms of accuracy. This paper proposes an improved ensemble machine learning approach using multiple classifiers for diabetes diagnosis based on the Pima Indians Diabetes Dataset (PIDD). The proposed ensemble voting classifier amalgamates five machine learning algorithms: Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbor (KNN), Random Forests (RF), and XGBoost. We obtained the individual model accuracies and used the ensemble method to improve accuracy. The proposed approach uses a pre-processing stage of standardization and imputation and applies the Local Outlier Factor (LOF) to remove data anomalies. The model was evaluated using sensitivity, specificity, and accuracy criteria. With a reported accuracy of 81%, the proposed approach shows promise compared to prior classification techniques.

Full Text