Objective to compare the predictive efficacy of random forest, BP neural network, gradient boosting tree and plain Bayesian models for the prevalence of diabetes. Practical application: by measuring the basic indicators such as individual height, weight, triglyceride, etc., the model can be used to predict the probability of individual disease, and then targeted to improve some indicators of the body, to achieve the effect of diabetes prevention intervention, and to provide new ideas for diabetes prevention research. Methods Using the 2009 survey data from the China Health and Nutrition Survey (CHNS), the data for men and women were statistically analyzed by dividing them into four groups according to the visceral fat index (VAI). Subsequently, the processed samples were divided into training sets and test sets by 4:1, and four machine learning models, namely, random forest, BP neural network, gradient lifting tree, and naive Bayes, were constructed. The experiment was conducted using a five-fold cross validation method, and the prediction effect was evaluated through indicators such as sensitivity, accuracy, and AUC. Results One-way ANOVA showed that the differences in height, weight, waist circumference, triglycerides, high-density lipoprotein cholesterol, body mass index, fasting blood glucose, and glycosylated hemoglobin among different VAI quartile groups were statistically significant (P<0.05). Comparison of prediction effects of four models: sensitivity 75.75%, 90.77%, 76.31%, 98.57%, accuracy 74.80%, 87.82%, 74.64%, 92.00%, AUC 0.713, 0.716, 0.668, 0.676, and Jorden index 0.34, 0.27, 0.22 and 0.21. Conclusion Based on the CHNS 2009 survey data, the BP neural network model has a better effect and stability in predicting diabetes.
Read full abstract