Abstract

SummaryDiabetes is one of the most common chronic disease causes severe life threatening complications. Therefore, it is important to diagnose diabetes at early stage to avoid health and financial burdens. In this work, a machine learning (ML) pipeline based systematic data‐driven architecture is proposed to identify diabetes. The proposed ML pipeline consisted of support vector machine‐synthetic minority oversampling technique (SVM‐SMOTE), followed by multiple tree based feature selection (FS) approaches, and ensemble learners. Further, Bayesian optimization (BO) has been used to tune the hyperparameters in classifiers. The use of SVM‐SMOTE, FS, and BO methods together improved classifiers' performance impressively in a highly imbalanced Virginia dataset. Also, the proposed model is proved to be a useful approach in comparatively less imbalanced Pima Indian Diabetes (PID) dataset. Among all classifiers used, random forest (RFC) has achieved the highest sensitivity of 91.44% in PID dataset and in Virginia AdaBoost (ABC) has achieved the highest of 88.53% sensitivity. Subsequently, XGBoost (XGB) and AdaBoost (ABC) classifiers have achieved the highest 92.08% and 88.27% AUC in PID and Virginia dataset, respectively. Such kind of impressive results suggest that the proposed approach can have a very high practical utility, in real medical diagnostic settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call