Efficient diagnosis of diabetes mellitus using an improved ensemble method

Blessing Oluwatobi Olorunfemi,Adewale Opeoluwa Ogunde,Ahmad Almogren,Abidemi Emmanuel Adeniyi,Sunday Adeola Ajagbe,Salil Bharany,Ayman Altameem,Ateeq Ur Rehman,Asif Mehmood,Habib Hamam

doi:10.1038/s41598-025-87767-1

Blessing Oluwatobi Olorunfemi, Adewale Opeoluwa Ogunde + Show 8 more

Open Access

https://doi.org/10.1038/s41598-025-87767-1

Copy DOI

Export

Save

Cite

Journal: Scientific Reports	Publication Date: Jan 25, 2025
License type: CC BY-NC-ND 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods—XG Boost, AdaBoostM1, and Gradient Boosting—using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives.

Full Text