Abstract

SummaryDiabetes is one of the most common chronic disease causes severe life threatening complications. Therefore, it is important to diagnose diabetes at early stage to avoid health and financial burdens. In this work, a machine learning (ML) pipeline based systematic data‐driven architecture is proposed to identify diabetes. The proposed ML pipeline consisted of support vector machine‐synthetic minority oversampling technique (SVM‐SMOTE), followed by multiple tree based feature selection (FS) approaches, and ensemble learners. Further, Bayesian optimization (BO) has been used to tune the hyperparameters in classifiers. The use of SVM‐SMOTE, FS, and BO methods together improved classifiers' performance impressively in a highly imbalanced Virginia dataset. Also, the proposed model is proved to be a useful approach in comparatively less imbalanced Pima Indian Diabetes (PID) dataset. Among all classifiers used, random forest (RFC) has achieved the highest sensitivity of 91.44% in PID dataset and in Virginia AdaBoost (ABC) has achieved the highest of 88.53% sensitivity. Subsequently, XGBoost (XGB) and AdaBoost (ABC) classifiers have achieved the highest 92.08% and 88.27% AUC in PID and Virginia dataset, respectively. Such kind of impressive results suggest that the proposed approach can have a very high practical utility, in real medical diagnostic settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.