Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease

Nathan Zoakah,Abel Ajibesin,Augustine Shey Nsang,Ayuba Zoakah

doi:10.54287/gujsa.1531997

Abstract

Diabetes is a prevalent non-communicable disease affecting many people globally. The common risk factors are obesity, age, lack of exercise, lifestyle, genetic factors, high blood pressure, and poor diet. Early identification of this condition can help prevent subsequent complications, including heart attacks, lower limb amputations, nerve damage, and blindness. Data mining and machine learning have become popular and successful methods of identifying numerous diseases, including Diabetes, using clinical data over the years. This study focuses on the principles and processes of Naïve Bayes, Support Vector Machines, Logistic Regression, Decision Tree, and Random Forest algorithms for diabetes prediction, using the Scikit-learn inbuilt libraries for the experiments. Furthermore, we ensemble all five machine learning models to produce a single stacked ensemble model. Data preprocessing techniques such as scaling, missing data removal, dimensionality reduction, and balancing of target class were performed on the Jos Urban Diabetes dataset used for this study. The comparison of the algorithms' performances across various evaluation metrics, demonstrates that the Support Vector Machines algorithm outperform all others in terms of Accuracy, Precision, Sensitivity, and Matthew’s Correlation Coefficient with scores of 96.11%, 91.61%, 85.67%, and 82.59% respectively with 10-fold cross-validation. Furthermore, the Stacked Ensemble Method model had the best Area Under the Receiver Operating Characteristic Curve scores of 98.47% with 10-fold cross-validation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease

Abstract

Talk to us

Similar Papers

More From: Gazi University Journal of Science Part A: Engineering and Innovation

Lead the way for us

Similar Papers

Enhanced Dermatoscopic Skin Lesion Classification using Machine Learning Techniques
P Varalakshmi ... V Aruna Devi
-
P Varalakshmi, et. al.P Varalakshmi ... V Aruna Devi
25 Mar 2021
25 Mar 2021

Opinion Mining of Customer Reviews Using Supervised Learning Algorithms
Shibbir Ahmed Arif ... Taslima Binte Hossain
-
Shibbir Ahmed Arif, et. al.Shibbir Ahmed Arif ... Taslima Binte Hossain
17 Dec 2021
17 Dec 2021

The performance of Logistic Regression, Decision Tree, KNN, Naive Bayes and SVM for identifying Automotive Cybersecurity Attack and Prevention: An Experimental Study
Vaishali Mishra, Sonali Kadam
Journal of Electrical Systems | VOL. 20
Vaishali Mishra, Sonali KadamVaishali Mishra, Sonali Kadam
04 Apr 2024
Journal of Electrical Systems | VOL. 20

Fake News Detection System using Logistic Regression and Compare Textual Property with Support Vector Machine Algorithm
N Leela Siva Rama Krishna ... M Adimoolam
-
N Leela Siva Rama Krishna, et. al.N Leela Siva Rama Krishna ... M Adimoolam
07 Apr 2022
07 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease

Abstract

Talk to us

Similar Papers

More From: Gazi University Journal of Science Part A: Engineering and Innovation