Abstract

BackgroundObesity and diabetes are global public health concerns. Studies indicate a relationship between socioeconomic, demographic and environmental variables and the spatial patterns of diet-related chronic disease. In this paper, we propose a methodology using model-based clustering and variable selection to predict rates of obesity and diabetes. We test this method through an application in the northeastern United States.MethodsWe use model-based clustering, an unsupervised learning approach, to find latent clusters of similar US counties based on a set of socioeconomic, demographic, and environmental variables chosen through the process of variable selection. We then use Analysis of Variance and Post-hoc Tukey comparisons to examine differences in rates of obesity and diabetes for the clusters from the resulting clustering solution.ResultsWe find access to supermarkets, median household income, population density and socioeconomic status to be important in clustering the counties of two northeastern states. The results of the cluster analysis can be used to identify two sets of counties with significantly lower rates of diet-related chronic disease than those observed in the other identified clusters. These relatively healthy clusters are distinguished by the large central and large fringe metropolitan areas contained in their component counties. However, the relationship of socio-demographic factors and diet-related chronic disease is more complicated than previous research would suggest. Additionally, we find evidence of low food access in two clusters of counties adjacent to large central and fringe metropolitan areas. While food access has previously been seen as a problem of inner-city or remote rural areas, this study offers preliminary evidence of declining food access in suburban areas.ConclusionsModel-based clustering with variable selection offers a new approach to the analysis of socioeconomic, demographic, and environmental data for diet-related chronic disease prediction. In a test application to two northeastern states, this method allows us to identify two sets of metropolitan counties with significantly lower diet-related chronic disease rates than those observed in most rural and suburban areas. Our method could be applied to larger geographic areas or other countries with comparable data sets, offering a promising method for researchers interested in the global increase in diet-related chronic disease.Electronic supplementary materialThe online version of this article (doi:10.1186/s12942-015-0017-5) contains supplementary material, which is available to authorized users.

Highlights

  • Obesity and diabetes are global public health concerns

  • While twosample t-tests conducted with the obesity, diabetes, and median household income variables confirm that each variable has significantly different means in each state, the means of other socioeconomic, demographic, and environmental variables were not significantly different across states

  • Variable selection Variable selection on the five standardized variables: unemployment, population density, median household income, socioeconomic status (SES) and low access to food determined that unemployment was not useful for clustering

Read more

Summary

Introduction

Obesity and diabetes are global public health concerns. Studies indicate a relationship between socioeconomic, demographic and environmental variables and the spatial patterns of diet-related chronic disease. We propose a methodology using model-based clustering and variable selection to predict rates of obesity and diabetes. We test this method through an application in the northeastern United States. In the United States, where one in three adults qualify as obese [8] and nearly one in ten suffer from diabetes [5], researchers have identified geographic patterns in the prevalence of diet-related chronic disease. Recent analysis has suggested that the patterns may be more complex: while researchers continued to observe high obesity rates in rural southern counties, lower obesity rates were seen in metropolitan and non-metropolitan counties elsewhere in the United States [11]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call