Comparative Research on Diabetes Influencing Factors Based on Random Forest and Decision Tree Models

Liping Li

doi:10.54097/7m4x7j04

Abstract

In tandem with society's rapid progress, the prevalence of diabetes has risen sharply due to factors such as changes in eating habits, bad lifestyle and serious aging problems. Therefore, it is of great significance to forecast the influencing factors of diabetes mellitus. In this paper, the Pima Indian Diabetes data set in UCI is taken as the experimental data, the Random Forest and Decision Tree methods are used for modeling. The effect of these two models is analyzed according to four indicators, comprised of rate of accuracy, ratio of precision, rate of recall and F1-score. The F1-score ratio of the Random Forest model is as high as 79%, while the F1-score ratio of the Decision Tree model is 72%. Hence, the Decision Tree model is slightly inferior to the Random Forest model. Based on the predictive variables selected in this article, the most important predictive variables of the factors causing diabetes are blood glucose, body mass index, age and diabetes pedigree function.

Full Text