Abstract

Earlier detection of individuals at the highest risk of developing diabetes is crucial to avoid the disease's prevalence and progression. Therefore, we aim to build a data-driven predictive application for screening subjects at a high risk of developing Type 2 Diabetes mellitus (T2DM) in the western region of Saudi Arabia. In this context, we designed and implemented a questionnaire-based cross-sectional study using conventional diabetes risk factors for studying the prevalence and the association between the outcomes and exposure (s). We used the Chi-Squared test and binary logistic regression to analyze and screen the most significant diabetes risk factor for T2DM risk prediction. Synthetic Minority Over-sampling Technique (SMOTE), a class-balancer, was used to balance the cross-sectional data. We used the balanced class data to screen the best performing classification algorithm to classify patients at high risk of diabetes with a higher F1 Score. The best performing classifier's hyper-parameters were further tuned using 10-fold cross-validation for achieving an improved F1 Score. Additionally, we validated our proposed model with the existing models built using the National Health and Nutrition Examination Survey (NHANES) dataset and Pima Indian Diabetes (PID) dataset. The results of the Chi-squared test and binary logistic regression showed that the exposures, namely Smoking, Healthy diet, Blood-Pressure (BP), Body Mass Index (BMI), Gender, and Region, contributed significantly (p <; 0.05) to the prediction of the Response variable (subjects at high risk of diabetes). The tuned two-class Decision Forest (DF) model showed better performance with an average F1score of 0.8453 ± 0.0268. Moreover, the DF based model adapted reasonably well in different diabetes dataset. An Application Programming Interface (API) of the tuned DF model was implemented and deployed as a web service at https://type2-diabetes-risk-predictor.herokuapp.com, and the implementation codes are available at https://github.com/SAH-ML/T2DM-Risk-Predictor.

Highlights

  • Type 2 Diabetes mellitus (T2DM) is a chronic metabolic disorder characterized by insulin resistance and high blood glucose, a kind of sugar in humans

  • As per the World Health Organization (WHO) reports, approximately 3 million people in KSA are on the verge of diabetes, i.e., prediabetes condition, and around 7 million of the population of the kingdom are affected with diabetes and its associated vascular complications [2]

  • International Diabetes Federation (IDF) has developed an online questionnaire-based diabetes risk assessment tool based on the Finnish Diabetes Risk Score (FINDRISC) [4] to predict an individual's risk of developing diabetes in the upcoming years

Read more

Summary

INTRODUCTION

Type 2 Diabetes mellitus (T2DM) is a chronic metabolic disorder characterized by insulin resistance and high blood glucose, a kind of sugar in humans. Machine Learning-based application can convert the point-in-time data into valued knowledge prerequisites for making data mining predictive tool to characterize patients at high risks of diabetes [17,18,19,20,21,22,23,24]. The precision and recall of such a web-based model are not satisfactory, so the predictions are not trustworthy In this regard, we intend to build an ML-based application to predict the risks of T2DM in Saudis based on specific diabetes risk factors. Our application based on typical diabetes risk factors will be the first ML-based real-time prediction tool for predicting the risk of diabetes in individuals belonging to the western region of the Kingdom of Saudi Arabia.

SURVEY METHODOLOGY
BINARY LOGISTIC REGRESSION
DATA PREPROCESSING FOR MODEL BUILDING AND CLASSIFIER COMPARISON
Return final weight vector
EVALUATION
TUNING HYPERPARAMETERS OF BEST PERFORMING ALGORITHM
COMPARING THE PERFORMANCE AND
RESULTS AND DISCUSSION
Method
CONCLUSION AND FUTURE SCOPE
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call