Oversample‐select‐tune: A machine learning pipeline for improving diabetes identification

Sujit Kumar Das,Pinki Roy,Arnab Kumar Mishra

doi:10.1002/cpe.6741

Abstract

SummaryDiabetes is one of the most common chronic disease causes severe life threatening complications. Therefore, it is important to diagnose diabetes at early stage to avoid health and financial burdens. In this work, a machine learning (ML) pipeline based systematic data‐driven architecture is proposed to identify diabetes. The proposed ML pipeline consisted of support vector machine‐synthetic minority oversampling technique (SVM‐SMOTE), followed by multiple tree based feature selection (FS) approaches, and ensemble learners. Further, Bayesian optimization (BO) has been used to tune the hyperparameters in classifiers. The use of SVM‐SMOTE, FS, and BO methods together improved classifiers' performance impressively in a highly imbalanced Virginia dataset. Also, the proposed model is proved to be a useful approach in comparatively less imbalanced Pima Indian Diabetes (PID) dataset. Among all classifiers used, random forest (RFC) has achieved the highest sensitivity of 91.44% in PID dataset and in Virginia AdaBoost (ABC) has achieved the highest of 88.53% sensitivity. Subsequently, XGBoost (XGB) and AdaBoost (ABC) classifiers have achieved the highest 92.08% and 88.27% AUC in PID and Virginia dataset, respectively. Such kind of impressive results suggest that the proposed approach can have a very high practical utility, in real medical diagnostic settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Oversample‐select‐tune: A machine learning pipeline for improving diabetes identification

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Journal: Concurrency and Computation: Practice and Experience	Publication Date: Nov 30, 2021
Citations: 6

Similar Papers

RFFE - Random Forest Fuzzy Entropy for the classification of Diabetes Mellitus.
A Usha Ruby ... Bn Chaithanya
AIMS public health | VOL. 10
A Usha Ruby, et. al.A Usha Ruby ... Bn Chaithanya
01 Jan 2023
AIMS public health | VOL. 10

A novel evolutionary ensemble prediction model using harmony search and stacking for diabetes diagnosis
Zaiheng Zhang ... Wenzong Zhu
Journal of King Saud University - Computer and Information Sciences | VOL. 36
Zaiheng Zhang, et. al.Zaiheng Zhang ... Wenzong Zhu
12 Dec 2023
Journal of King Saud University - Computer and Information Sciences | VOL. 36

Kombinasi Metode Correlated Naive Bayes dan Metode Seleksi Fitur Wrapper untuk Klasifikasi Data Kesehatan
Hairani Hairani ... Muhammad Innuddin
Jurnal Teknik Elektro | VOL. 11
Hairani Hairani, et. al.Hairani Hairani ... Muhammad Innuddin
27 Apr 2020
Jurnal Teknik Elektro | VOL. 11

A Computational Intelligence Technique for Effective and Early Diabetes Detection using Rough Set Theory
Allam Apparao ... P V Nageswara Rao
International Journal of Computer Applications | VOL. 95
Allam Apparao, et. al.Allam Apparao ... P V Nageswara Rao
18 Jun 2014
International Journal of Computer Applications | VOL. 95

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Oversample‐select‐tune: A machine learning pipeline for improving diabetes identification

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience