Background and aims: Identifying subjects that are at risk of type 2 diabetes mellitus (T2DM) and predicting the associated risk factors are highly important. Thus, this study aimed to explore the risk factors and find the prediction model for T2DM using decision trees (DTs) and random forest (RF) models. Methods: This cross-sectional study is a part of the Kharameh Cohort Study. Kharameh Cohort is a part of the Fars Cohort, which started in 2014 with 10663 people aged 40–70. In this study, the risk factors of T2DM were explored using two data mining methods. Accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were applied to evaluate the models. The data were statistically analyzed using R software. Results: The DT modeling showed that age, triglycerides (TG), physical activity, systolic blood pressure, low-density lipoproteins (LDL), and body mass index (BMI) were the most associated factors in D2MT, while applying RF revealed that fasting blood sugar, cholesterol, creatinine, TG, gamma-glutamyl transferase physical activity, BMI, and LDL were the most effective on T2DM. The RF model was superior to the DT based on the applied criteria. Sensitivity, specificity, accuracy, and AUC for the RF were 73.4, 70.10, 73.5, and 79.1. These findings for the DT were 63.8, 69.7, 62.8, and 66.8, respectively. Conclusion: Based on the inferences, a strong association was found between several risk factors and the risk of T2DM. Therefore, predictive analytics using the RF model can be applied to identify the risk factors of other chronic diseases.
Read full abstract