Abstract

BackgroundThe prevalence of type 2 diabetes has increased substantially in India over the past 3 decades. Undiagnosed diabetes presents a public health challenge, especially in rural areas, where access to laboratory testing for diagnosis may not be readily available. ObjectivesThe present work explores the use of several machine learning and statistical methods in the development of a predictive tool to screen for prediabetes using survey data from an FFQ to compute the Global Diet Quality Score (GDQS). MethodsThe outcome variable prediabetes status (yes/no) used throughout this study was determined based upon a fasting blood glucose measurement ≥100 mg/dL. The algorithms utilized included the generalized linear model (GLM), random forest, least absolute shrinkage and selection operator (LASSO), elastic net (EN), and generalized linear mixed model (GLMM) with family unit as a (cluster) random (intercept) effect to account for intrafamily correlation. Model performance was assessed on held-out test data, and comparisons made with respect to area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. ResultsThe GLMM, GLM, LASSO, and random forest modeling techniques each performed quite well (AUCs >0.70) and included the GDQS food groups and age, among other predictors. The fully adjusted GLMM, which included a random intercept for family unit, achieved slightly superior results (AUC of 0.72) in classifying the prediabetes outcome in these cluster-correlated data. ConclusionsThe models presented in the current work show promise in identifying individuals at risk of developing diabetes, although further studies are necessary to assess other potentially impactful predictors, as well as the consistency and generalizability of model performance. In addition, future studies to examine the utility of the GDQS in screening for other noncommunicable diseases are recommended.

Highlights

  • Type 2 diabetes (T2D) continues to increase substantially in South Asia [1] with more than half of T2D cases being undiagnosed [2]

  • Model performance In this cohort of 5655 participants from rural South India, we found that a generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), random forest, and generalized linear mixed model (GLMM) with a random effect for family cluster all demonstrated adequate predictive capability for identifying prediabetes using only predictors derived from questionnaire data

  • The GLMM using age and Global Diet Quality Score (GDQS) as predictors obtained an AUC only 0.001 lower than the GLMM using age and GDQS food groups as predictors, suggesting that accounting for the GDQS in either form leads to similar performance in this task, using the food group daily totals provides the model with more information about specific components of the diet

Read more

Summary

Introduction

Type 2 diabetes (T2D) continues to increase substantially in South Asia [1] with more than half of T2D cases being undiagnosed [2]. To mitigate the increasing rates of T2D, it is imperative to identify individuals with prediabetes to prevent progression to T2D This is especially crucial in rural areas in low- and middle-income countries (LMICs) like India where over two-thirds of the population live in resource-limited, rural areas [3]. Objectives: The present work explores the use of several machine learning and statistical methods in the development of a predictive tool to screen for prediabetes using survey data from an FFQ to compute the Global Diet Quality Score (GDQS). Conclusions: The models presented in the current work show promise in identifying individuals at risk of developing diabetes, further studies are necessary to assess other potentially impactful predictors, as well as the consistency and generalizability of model performance.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call