There are already 537 million people suffering from diabetes worldwide, according to the research data released by the International Diabetes Federation in 2022. It is a frequent and pervasive condition that has overtaken cancer as the second leading cause of death from modern diseases and is second only to it in terms of the harm it causes to the human body. The excessive morbidity, mortality, shortened life expectancy, and economic and other costs associated with it make it an important condition that plagues people. This study selected the Diabetes Health Indicators Dataset from the Kaggle database based on machine learning algorithm prediction models in predicting the risk of diabetes. 70% of the data were randomly chosen from the presence of high blood pressure (High BP), high cholesterol (High Chol), body mass index (BMI), smoking status (Smoker), and the risk of diabetes, smoking status (Smoker), health status, age, education, income, and gender as independent variables, and whether or not they had diabetes as dependent variables, and built diabetes prediction models based on five traditional machine algorithms such as random forests, logistic regression, decision trees, Support Vector Machine (SVM), XGBoost, and deep neural networks (DNN). The models were then validated with 30% of the samples. Three indicators, Accuracy, AUC, and Recall, were compared and analyzed, and DNN was found to have a high prediction accuracy (75.49%) and recall rate (79%).
Read full abstract