Construction of Xinjiang metabolic syndrome risk prediction model based on interpretable models

Yan Zhang,Lei Yang,Xiaoxu Zhang,Mingqin Cao,Deyang Li,Jaina Razbek,Mayisha Daken,Hongkai Mao,Liangliang Bao,Wenjun Xia

doi:10.1186/s12889-022-12617-y

Yan Zhang, Lei Yang + Show 8 more

Open Access

https://doi.org/10.1186/s12889-022-12617-y

Copy DOI

Abstract

BackgroundWe aimed to construct simple and practical metabolic syndrome (MetS) risk prediction models based on the data of inhabitants of Urumqi and to provide a methodological reference for the prevention and control of MetS.MethodsThis is a cross-sectional study conducted in the Xinjiang Uygur Autonomous Region of China. We collected data from inhabitants of Urumqi from 2018 to 2019, including demographic characteristics, anthropometric indicators, living habits and family history. Resampling technology was used to preprocess the data imbalance problems, and then MetS risk prediction models were constructed based on logistic regression (LR) and decision tree (DT). In addition, nomograms and tree diagrams of DT were used to explain and visualize the model.ResultsOf the 25,542 participants included in the study, 3,267 (12.8%) were diagnosed with MetS, and 22,275 (87.2%) were diagnosed with non-MetS. Both the LR and DT models based on the random undersampling dataset had good AUROC values (0.846 and 0.913, respectively). The accuracy, sensitivity, specificity, and AUROC values of the DT model were higher than those of the LR model. Based on a random undersampling dataset, the LR model showed that exercises such as walking (OR=0.769) and running (OR= 0.736) were protective factors against MetS. Age 60 ~ 74 years (OR=1.388), previous diabetes (OR=8.902), previous hypertension (OR=2.830), fatty liver (OR=3.306), smoking (OR=1.541), high systolic blood pressure (OR=1.044), and high diastolic blood pressure (OR=1.072) were risk factors for MetS; the DT model had 7 depth layers and 18 leaves, with BMI as the root node of the DT being the most important factor affecting MetS, and the other variables in descending order of importance: SBP, previous diabetes, previous hypertension, DBP, fatty liver, smoking, and exercise.ConclusionsBoth DT and LR MetS risk prediction models have good prediction performance and their respective characteristics. Combining these two methods to construct an interpretable risk prediction model of MetS can provide methodological references for the prevention and control of MetS.

Highlights

Metabolic syndrome (MetS) is a type of metabolic disorder characterized by central obesity, hypertension, hyperglycaemia and dyslipidaemia [1]
Building risk prediction models Comparing model classification performance We selected statistically significant variables for logistic regression (LR) (Model 1 ~ Model 5) and decision tree (DT) (Model 6 ~ Model 10) multivariate analysis based on five datasets: original imbalanced training dataset, random oversampling, random undersampling, hybrid sampling, and synthetic minority oversampling technique (SMOTE)
Compared with the original dataset, the random oversampling, random undersampling, hybrid sampling and SMOTE datasets had decreased accuracy and specificity on LR and DT but increased sensitivity and area under the receiver operating characteristic curve (AUROC) values. Both LR and DT models based on random undersampling datasets had better AUROC values

Summary

Introduction

Metabolic syndrome (MetS) is a type of metabolic disorder characterized by central obesity, hypertension, hyperglycaemia and dyslipidaemia [1]. It is worth noting that the prevalence of MetS is on the rise due to rapid economic growth, an ageing population, lifestyle changes, Zhang et al BMC Public Health (2022) 22:251 and obesity. Health checkups are the first stage of disease prevention, and data mining of medical checkup information can help identify people at high risk of MetS at an early stage, moving the timing of disease prevention and control forwards. The construction of MetS risk prediction models based on physical examination data is important for the prevention and control of MetS. We aimed to construct simple and practical metabolic syndrome (MetS) risk prediction models based on the data of inhabitants of Urumqi and to provide a methodological reference for the prevention and control of MetS

Objectives

Methods

Results

Discussion

Conclusion