Diabetes Prediction Using Boosting Algorithms: Performance Comparison

Gururaj N Kulkarni,Sateesh Ambesange,A Vijayalaxmi,A Preethi

doi:10.1007/978-3-030-91244-4_18

Abstract

Early detection and control of diabetes can help to prevent associated long term health risks to heart, lungs, kidneys, neural system etc. In this work we have developed a boosting ensemble machine learning (ML) model to predict diabetes based on Pima Indian Diabetes Dataset (PIDD). The data set is preprocessed to enhance the learning ability of the model. We have used various data preprocessing techniques like standardization, outlier removal, data balancing and dimension reduction. The performance of various machine learning algorithms like Logistic Regression (LR), Random Forest (RF) classifier, AdaBoost and Extreme Gradient Boost (XGBoost) are compared to select the best model. The performance metric used for comparison consists of Accuracy, Recall, Precision, F1-Score and Area under ROC Curve (AUC-ROC). Since the application is medical diagnosis, the cost associated with false negative is of utmost importance, thus Recall value played significant role in selecting the best model. Among the basic ML, LR and RF, based models; RF with power transformer achieved highest prediction accuracy and recall value of 0.968 and 0.924 respectively. The boosting ensemble ML models predicted diabetes with better performance metric, which was further improved by hyper-parameter tuning. The AdaBoost based model achieved an accuracy of 0.966 and recall value of 0.97. The best model to predict diabetes based on PIDD, as per this work is hyper-parameter tuned XGBoost model with accuracy and recall value of 1.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Diabetes Prediction Using Boosting Algorithms: Performance Comparison

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Transfer-Ensemble Learning: A Novel Approach for Mapping Urban Land Use/Cover of the Indian Metropolitans
Prosenjit Barman ... Sudhir Kumar Singh
Sustainability | VOL. 15
Prosenjit Barman, et. al.Prosenjit Barman ... Sudhir Kumar Singh
06 Dec 2023
Sustainability | VOL. 15

Ensemble machine learning-based models for estimating the transfer length of strands in PSC beams
Viet-Linh Tran ... Jin-Kook Kim
Expert Systems with Applications | VOL. 221
Viet-Linh Tran, et. al.Viet-Linh Tran ... Jin-Kook Kim
01 Mar 2023
Expert Systems with Applications | VOL. 221

Prediction of surface chloride concentration of marine concrete using ensemble machine learning
Rong Cai ... Hongyan Ma
Cement and Concrete Research | VOL. 136
Rong Cai, et. al.Rong Cai ... Hongyan Ma
01 Jul 2020
Cement and Concrete Research | VOL. 136

Three-dimensional spatial prediction of Zn in the soil of a former tire manufacturing plant using machine learning and readily attainable multisource auxiliary data
Yuxuan Peng ... Yongcun Zhao
Environmental Pollution | VOL. 318
Yuxuan Peng, et. al.Yuxuan Peng ... Yongcun Zhao
21 Dec 2022
Environmental Pollution | VOL. 318

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Diabetes Prediction Using Boosting Algorithms: Performance Comparison

Abstract

Talk to us

Similar Papers