Identifying predictors of cancer prevalence at the neighborhood level in the United States: A Bayesian machine learning approach

Li Niu,Bian Liu,Yan Li,Liangyuan Hu

doi:10.1289/isee.2021.p-284

Abstract

BACKGROUND AND AIM: Cancer is the second leading cause of death in the United States (US). Individual- level factors including utilization of prevention, health behaviors, environmental exposure, and sociodemographic measures have been linked to individual-level cancer risks. However, risk factors of cancer at the neighborhood level remain understudied. To fill this research gap, we identify and rank important predictors of cancer prevalence at the neighborhood level in the US. METHODS: We developed a new neighborhood dataset by combining data from the Population Level Analysis and Community Estimates (PLACES), a dataset with population health data across all the US census tracts (n=72,337), with environmental exposure data from the Environmental Justice Screening database and sociodemographic factors from the American Community Survey. Our outcome of interest was tract-level adult cancer prevalence. We included 23 tract-level explanatory variables, including unhealthy behaviors (e.g., smoking, no leisure-time physical activity, drinking), prevention measures (e.g., cholesterol screening), environmental exposures (e.g., air toxics, lead paint), and sociodemographic factors (e.g., racial and ethnic composition, poverty, age 65 years and over). We used Bayesian additive regression trees (BART) to identify the most important predictors of cancer prevalence. RESULTS:The median prevalence of adults diagnosed with cancer was 6.7% (interquartile range: 5.4%-7.7%) across US census tracts. Based on local threshold criteria, we identified the five most important predictors of cancer prevalence: percentage of adults 65 years old or over, prevalence of routine checkup, percentage of non-Hispanic white, percentage of housing built before 1960, and percentage of individuals below the lower poverty level. CONCLUSIONS:Using an integrated neighborhood dataset with fine geographic resolution and a machine learning approach, we identified several important predictors of cancer prevalence at the neighborhood level in the US. The results may inform public health practitioners and policymakers to prioritize the improvement of environmental and neighborhood factors in reducing cancer burden. KEYWORDS: Cancer, Neighborhood-level analysis, Environmental exposures, Big data, Bayesian additive regression trees

Full Text