Abstract

Background: With the continuous changes in lifestyles and the aging of the population, the prevalence of diabetes mellitus (DM) is also increasing year by year. How to apply artificial intelligence (AI) to manage DM has become an important topic in the global health care system. Objective: Compare the application value of different machine learning (ML) algorithms and traditional logistic regression (LR) algorithms in predicting risk factors for DM. Methods: This was a 3-year retrospective cohort study, which was part of the China REACTION study. The research data came from a natural population of 4,314 people who conducted chronic disease surveys in Wuyishan City, Fujian Province, China from March 2011 to January 2015. Finally, 3687 participants were included in the final data analysis. Traditional LR and five ML classifiers, including random forests (RF), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), support vector machines (SVM), multi-layer perceptrons (MLP) were used to screen DM risk factors and select optimal prediction models. The performance of the model was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, negative predictive value, F1-score and precision. We used the classifier with the best performance to construct the prediction model. Results: Through XGBoost, RF, and AdaBoost, we finally determined the 10 most relevant risk factors for DM from 51 variables. Including HbA1c, impaired glucose tolerance, fasting serum insulin, age, visceral adiposity index, neck circumference, gamma-glutamyl tranzpehtidaze, brachial-ankle pulse wave velocity, systolic blood pressure, and body mass index. In the selection of prediction models, the XGBoost classifier had the best performance, with an AUC value of 0.872. Followed by LR model (0.867), RF model (0.824), SVM model (0.680) and MLP model (0.674). In the end, we used the XGBoost classifier to construct the DM prediction model. Through 10-fold cross-validation, the AUC value of the XGBoost model in the testing set could be as high as 0.904. Conclusion: Compared with traditional LR, ML has higher accuracy when constructing predictive models. ML model is a simple and efficient classifier that could be used to identify risk factors for chronic diseases. Funding: This work was supported by the Chinese Medical Association Foundation and Chinese Endocrine Society (Grant 12020240314), the National Natural Science Foundation of China (Grant 81270874) and the Natural Science Foundation of Fujian Province (Grant 2011J06012). This work is supported by the grants 2016YFC0901203 from the Ministry of Science and Technology. Declaration of Interest: None to declare. Ethical Approval: The research protocol has been approved by the ethics committee of Fujian Provincial Hospital.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call