Exploration of Machine Learning and Statistical Techniques in Development of a Low-Cost Screening Method Featuring the Global Diet Quality Score for Detecting Prediabetes in Rural India

Nick Birk,Mika Matsuzaki,Teresa T Fung,Yanping Li,Carolina Batis,Meir J Stampfer,Megan Deitchler,Walter C Willett,Wafaie W Fawzi,Sabri Bromage,Sanjay Kinra,Shilpa N Bhupathiraju,Erin Lake

doi:10.1093/jn/nxab281

Abstract

BackgroundThe prevalence of type 2 diabetes has increased substantially in India over the past 3 decades. Undiagnosed diabetes presents a public health challenge, especially in rural areas, where access to laboratory testing for diagnosis may not be readily available. ObjectivesThe present work explores the use of several machine learning and statistical methods in the development of a predictive tool to screen for prediabetes using survey data from an FFQ to compute the Global Diet Quality Score (GDQS). MethodsThe outcome variable prediabetes status (yes/no) used throughout this study was determined based upon a fasting blood glucose measurement ≥100 mg/dL. The algorithms utilized included the generalized linear model (GLM), random forest, least absolute shrinkage and selection operator (LASSO), elastic net (EN), and generalized linear mixed model (GLMM) with family unit as a (cluster) random (intercept) effect to account for intrafamily correlation. Model performance was assessed on held-out test data, and comparisons made with respect to area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. ResultsThe GLMM, GLM, LASSO, and random forest modeling techniques each performed quite well (AUCs >0.70) and included the GDQS food groups and age, among other predictors. The fully adjusted GLMM, which included a random intercept for family unit, achieved slightly superior results (AUC of 0.72) in classifying the prediabetes outcome in these cluster-correlated data. ConclusionsThe models presented in the current work show promise in identifying individuals at risk of developing diabetes, although further studies are necessary to assess other potentially impactful predictors, as well as the consistency and generalizability of model performance. In addition, future studies to examine the utility of the GDQS in screening for other noncommunicable diseases are recommended.

Highlights

Type 2 diabetes (T2D) continues to increase substantially in South Asia [1] with more than half of T2D cases being undiagnosed [2]
Model performance In this cohort of 5655 participants from rural South India, we found that a generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), random forest, and generalized linear mixed model (GLMM) with a random effect for family cluster all demonstrated adequate predictive capability for identifying prediabetes using only predictors derived from questionnaire data
The GLMM using age and Global Diet Quality Score (GDQS) as predictors obtained an AUC only 0.001 lower than the GLMM using age and GDQS food groups as predictors, suggesting that accounting for the GDQS in either form leads to similar performance in this task, using the food group daily totals provides the model with more information about specific components of the diet

Summary

Introduction

Type 2 diabetes (T2D) continues to increase substantially in South Asia [1] with more than half of T2D cases being undiagnosed [2]. To mitigate the increasing rates of T2D, it is imperative to identify individuals with prediabetes to prevent progression to T2D This is especially crucial in rural areas in low- and middle-income countries (LMICs) like India where over two-thirds of the population live in resource-limited, rural areas [3]. Objectives: The present work explores the use of several machine learning and statistical methods in the development of a predictive tool to screen for prediabetes using survey data from an FFQ to compute the Global Diet Quality Score (GDQS). Conclusions: The models presented in the current work show promise in identifying individuals at risk of developing diabetes, further studies are necessary to assess other potentially impactful predictors, as well as the consistency and generalizability of model performance.

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Journal of nutrition	Publication Date: Oct 1, 2021
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploration of Machine Learning and Statistical Techniques in Development of a Low-Cost Screening Method Featuring the Global Diet Quality Score for Detecting Prediabetes in Rural India

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of nutrition

Lead the way for us

Similar Papers

A Combined Modeling of Generalized Linear Mixed Model and LASSO Techniques for Analizing Monthly Rainfall Data
A Muslim ... B Sartono
IOP Conference Series: Earth and Environmental Science | VOL. 187
A Muslim, et. al.A Muslim ... B Sartono
01 Nov 2018
IOP Conference Series: Earth and Environmental Science | VOL. 187

Ratemaking application of Bayesian LASSO with conjugate hyperprior
Himchan Jeong ... Emiliano A Valdez
SSRN | VOL. -
Himchan Jeong, et. al.Himchan Jeong ... Emiliano A Valdez
12 Oct 2018
SSRN | VOL. -

Comparative Analysis of Statistical and Machine Learning Techniques for Rice Yield Forecasting for Chhattisgarh, India
Anurag Satpathi ... Surendra Singh
Sustainability | VOL. 15
Anurag Satpathi, et. al.Anurag Satpathi ... Surendra Singh
03 Feb 2023
Sustainability | VOL. 15

Parsimonious and robust multivariate calibration with rational function Least Absolute Shrinkage and Selection Operator and rational function Elastic Net
P Teppola ... V.-M Taavitsainen
Analytica Chimica Acta | VOL. 768
P Teppola, et. al.P Teppola ... V.-M Taavitsainen
07 Feb 2013
Analytica Chimica Acta | VOL. 768

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploration of Machine Learning and Statistical Techniques in Development of a Low-Cost Screening Method Featuring the Global Diet Quality Score for Detecting Prediabetes in Rural India

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of nutrition