Abstract

Introduction: Machine learning models hold potential for improved cardiometabolic disease prediction and more efficacious prevention and intervention efforts. Current models are limited by use of cross-sectional datasets and variables (features) consisting of diagnostic criteria (such as blood glucose or hemoglobin A1c in type 2 diabetes models) and overlook health behaviors and social determinants of health (HB-SDH). Hypothesis: Machine learning models built with HB-SDH variables will accurately predict type 2 diabetes. Methods: We used data from 2,493 participants without diabetes in the Adolescent to Adult Health study (mean age = 15.5 years at baseline (1994), 40.9% male) who contributed blood samples and anthropometric measures in wave 4 (2008) or 5 (2016-2018). We used an 80/20 split to form separate training and test datasets; the training dataset was used to build random forest and XGBoost models, and performance with respect to classifying diabetes was evaluated in the test dataset. We first built a model using 15 demographic and biomarker features as a comparator for our second model, which included 34 HB-SDH features measured during adulthood but no biomarkers. Results: Five-hundred sixteen participants developed type 2 diabetes. In XGBoost, the comparator model was found to predict diabetes with 92.6% accuracy (95% CI: 89.8%, 94.8%) and area under the curve (AUC) = 0.8165. The HB-SDH model had 88.7% accuracy (95% CI: 85.4, 91.4%) and AUC = 0.7631. Using random forest, the comparator model had 93.0% accuracy (95% CI: 90.3, 95.2%) and AUC = 0.8376, while the HB-SDH model had 89.8% (95% CI: 88.6, 92.4%) accuracy and AUC = 0.7650. The top 10 variables in ascertaining classifications in the random forest model included self-reported general health, waist circumference, self-reported anxiety diagnosis, self-reported hypertension diagnosis, income, BMI, education level, change in weight, financial scarcity, and experienced stigma, while in the XGBoost model these were self-reported general health, experienced stigma, financial scarcity, self-reported anxiety diagnosis, experienced weight stigma, waist circumference, income, self-reported hypertension diagnosis, sugar-sweetened beverage intake, and BMI. Conclusions: Machine learning models built with behavioral and social determinants of health variables perform comparably to models with biomarkers and identify salient features for diabetes risk. The features that factored into classification performance were similar between random forest and XGBoost models. In this population, HB-SDH of relevance included socioeconomic indicators, sugary beverages, anxiety, and experiencing stigma, in addition to adiposity and hypertension. This finding needs to be replicated in diverse cohorts that may experience different HB-SDH.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.