Background/Objectives: Understanding the relationship between dietary patterns, nutrient intake, and chronic disease risk is critical for public health strategies. However, confounding from lifestyle and individual factors complicates the assessment of diet–disease associations. Emerging machine learning (ML) techniques offer novel approaches to clarifying the importance of multifactorial predictors. This study investigated the associations between animal-sourced and plant-based dietary patterns and Type 2 diabetes (T2D) history, accounting for diet–lifestyle patterns employing the XGBoost algorithm. Methods: Using data from the National Health and Nutrition Examination Survey (NHANES) from 2013 to 2016, individuals consuming animal-sourced foods (ASF) and plant-based foods (PBF) were propensity score-matched on key confounders, including age, gender, body mass index, energy intake, and physical activity levels. Predictors of T2D history were analyzed using the XGBoost classifier, with feature importance derived from Shapley plots. Lifestyle and dietary patterns derived from principal component analysis (PCA) were incorporated as predictors, and high multicollinearity among predictors was examined. Results: A total of 2746 respondents were included in the analysis. Among the top predictors of T2D were age, BMI, unhealthy lifestyle, and the ω6: ω3 fatty acid ratio. Higher intakes of protein from ASFs and fats from PBFs were associated with lower T2D risk. The XGBoost model achieved an accuracy of 83.4% and an AUROC of 68%. Conclusions: This study underscores the complex interactions between diet, lifestyle, and body composition in T2D risk. Machine learning techniques like XGBoost provide valuable insights into these multifactorial relationships by mitigating confounding and identifying key predictors. Future research should focus on prospective studies incorporating detailed nutrient analyses and ML approaches to refine prevention strategies and dietary recommendations for T2D.
Read full abstract