Comparison of Data Science and Qualitative Approaches for Variable Selection of County‐Level Social Determinants of Health

L Evans,A White,K Witgert

doi:10.1111/1475-6773.13388

Abstract

Research ObjectiveHealth services researchers' use of social determinants of health (SDOH) variables in quantitative models is increasing, and many publicly available data sources contain scores of high‐quality, complete SDOH variables. However, determining which SDOH variables are most important to include among those available creates challenges for variable selection. One approach is relying on a conceptual framework, prior research, and intuition. But, often conceptual framework domains broadly describe “external context” or “community factors” that provide little help with identifying specific variables to use. Data science methods, particularly random forest regression, are a potential data‐driven approach for SDOH variable selection. This study compared a qualitative approach and a data‐driven approach to SDOH variable selection to identify key SDOH predictors of county‐level health outcomes.Study DesignWe constructed an initial dataset of county‐level SDOH variables compiled from the following data sources: Area Health Resources File, County Health Rankings, American Community Survey, Picture of Subsidized Households, Penn State University’s Social Capital Index, and the Food Environment Atlas. We then employed a qualitative variable selection approach using the Healthy People 2020 organizing framework for SDOH. We purposively selected 6 variables that touched on all 5 domains of the framework, had sufficient variation across counties, were relatively normally distributed, and had established associations with health outcomes in the literature. Next, we employed a data‐driven variable selection approach using random forest regression. We used 3 random forest regression models, each with a different county‐level health outcome specified, and determined the top 6 SDOH predictors driving each outcome. We used the following outcomes: premature death (days of life lost), proportion of the population reporting fair or poor health, and preventable hospitalization rate (ambulatory care sensitive conditions). We identified overlap among the 6 SDOH predictors determined from each random forest model to determine the final set of variables using the data‐driven approach. We then compared the SDOH variables determined using the data‐driven approach to those selected using the qualitative approach.Population StudiedWe included all 3142 U.S. counties in the analysis, and our dataset contained 81 SDOH variables.Principal FindingsWe selected the following SDOH variables using the qualitative approach: median household income, poverty rate, primary care physician‐to‐population ratio, social deprivation index, food environment index, and proportion of the population that reports severe housing problems. The following SDOH variables were selected using the data‐driven approach: median household income (3 models), poverty rate (2 models), proportion of the population with some college (2 models), proportion of the population who report excessive drinking (2 models), proportion of the population who identifies as American Indian or Alaskan Native (2 models), and social capital index (2 models). Two of the 6 variables selected using the qualitative approach (median household income and poverty rate) were validated by the data‐driven approach.ConclusionsRandom forest models can assist with SDOH variable selection for quantitative analysis. However, variables selected using these techniques may not align well with those selected using qualitative approaches.Implications for Policy or PracticeResearchers should consider using data science approaches to validate and compliment—rather than supplement—qualitative approaches to variable selection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of Data Science and Qualitative Approaches for Variable Selection of County‐Level Social Determinants of Health

Abstract

Talk to us

Similar Papers

More From: Health Services Research

Lead the way for us

Similar Papers

Linear and Non-Linear Associations of Gonorrhea Diagnosis Rates with Social Determinants of Health
Ramal Moonesinghe ... Benedict I Truman
International Journal of Environmental Research and Public Health | VOL. 9
Ramal Moonesinghe, et. al.Ramal Moonesinghe ... Benedict I Truman
01 Sep 2012
International Journal of Environmental Research and Public Health | VOL. 9

PCR55 Analyzing the Relationship between Social Determinants of Health (SDOH) on Hospital Capacity for COVID-19 Patients By Geographic Region
K Watkins ... K Rademacher
Value in Health | VOL. 25
K Watkins, et. al.K Watkins ... K Rademacher
25 Jun 2022
Value in Health | VOL. 25

Exploring Social Determinants of Health as Predictors of Mortality During 2012-2016, Among Black Women with Diagnosed HIV Infection Attributed to Heterosexual Contact, United States.
Lakeshia Watson ... Xiaohong Hu
Journal of Racial and Ethnic Health Disparities | VOL. 6
Lakeshia Watson, et. al.Lakeshia Watson ... Xiaohong Hu
12 Apr 2019
Journal of Racial and Ethnic Health Disparities | VOL. 6

Abstract 14088: Impact of Social Determinants of Health on Blood Pressure and Cholesterol Control in a Smartphone-Based Cardiovascular Risk Self-Management Program
Vedant S Pargaonkar ... Brian Roach
Circulation | VOL. 148
Vedant S Pargaonkar, et. al.Vedant S Pargaonkar ... Brian Roach
07 Nov 2023
Circulation | VOL. 148

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of Data Science and Qualitative Approaches for Variable Selection of County‐Level Social Determinants of Health

Abstract

Talk to us

Similar Papers

More From: Health Services Research