An Ensemble Learning Approach for Effective Prediction of Diabetes Mellitus Using Hard Voting Classifier

Mohammad Atif,Faisal Anwer,Faisal Talib

doi:10.17485/ijst/v15i39.1520

Mohammad Atif, Faisal Anwer + Show 1 more

Open Access

PDF Available

https://doi.org/10.17485/ijst/v15i39.1520

Copy DOI

Export

Save

Cite

Journal: Indian Journal Of Science And Technology	Publication Date: Oct 21, 2022
Citations: 5	License type: cc-by

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Objectives: People all across the world are afflicted by the deadly ailment known as diabetes. Diabetes is a terrible condition characterized by high blood glucose levels. This chronic condition is one of the leading causes of death for people worldwide. Early identification and prediction of diabetes can be aided by machine learning techniques. The purpose of this study is to use an ensemble of machine learning algorithms to predict diabetes efficiently in order to help the patients suffering from this lethal disease. Methods: The existing methods use a single model to predict diabetes, which may have an impact on accuracy because no one model can fit all datasets. Therefore we propose a robust model based on ensemble learning using hard voting classifier. Both the Pima Indians Diabetes dataset and the Early Stage Diabetes Risk Prediction Dataset, which collect data on people with and without diabetes, were tested. For classification, the proposed ensemble hard voting classifier uses a combination of three machine learning algorithms namely logistic regression, decision tree, and support vector machine. Findings: On the PIMA diabetes dataset, the proposed ensemble approach achieves the highest accuracy, precision, recall, and F1 score value of 81.17%, while on the Early Stage Diabetes Risk Prediction Dataset, it achieves the highest accuracy, precision, recall, and F1 score value of 94.23%. Novelty: The proposed methodology was experimentally tested using the state-of-the-art technology and basic classifiers such as K-Nearest Neighbor, Logistic Regression, Support Vector Machine, and Random Forest. The results are validated by computing the confusion matrix and ROC for each classier type. Keywords: Diabetes Detection; Machine Learning; Supervised Classification; Ensemble Classification; Hard Voting Classifier

Full Text