Abstract

BackgroundReliable evaluations of state-level policies are essential for identifying effective policies and informing policymakers’ decisions. State-level policy evaluations commonly use a difference-in-differences (DID) study design; yet within this framework, statistical model specification varies notably across studies. More guidance is needed about which set of statistical models perform best when estimating how state-level policies affect outcomes.MethodsMotivated by applied state-level opioid policy evaluations, we implemented an extensive simulation study to compare the statistical performance of multiple variations of the two-way fixed effect models traditionally used for DID under a range of simulation conditions. We also explored the performance of autoregressive (AR) and GEE models. We simulated policy effects on annual state-level opioid mortality rates and assessed statistical performance using various metrics, including directional bias, magnitude bias, and root mean squared error. We also reported Type I error rates and the rate of correctly rejecting the null hypothesis (e.g., power), given the prevalence of frequentist null hypothesis significance testing in the applied literature.ResultsMost linear models resulted in minimal bias. However, non-linear models and population-weighted versions of classic linear two-way fixed effect and linear GEE models yielded considerable bias (60 to 160%). Further, root mean square error was minimized by linear AR models when we examined crude mortality rates and by negative binomial models when we examined raw death counts. In the context of frequentist hypothesis testing, many models yielded high Type I error rates and very low rates of correctly rejecting the null hypothesis (< 10%), raising concerns of spurious conclusions about policy effectiveness in the opioid literature. When considering performance across models, the linear AR models were optimal in terms of directional bias, root mean squared error, Type I error, and correct rejection rates.ConclusionsThe findings highlight notable limitations of commonly used statistical models for DID designs, which are widely used in opioid policy studies and in state policy evaluations more broadly. In contrast, the optimal model we identified--the AR model--is rarely used in state policy evaluation. We urge applied researchers to move beyond the classic DID paradigm and adopt use of AR models.

Highlights

  • Reliable evaluations of state-level policies are essential for identifying effective policies and informing policymakers’ decisions

  • Using a simulation study based on observed state-level opioid mortality, we assessed statistical performance using various metrics, including directional bias, magnitude bias, and root mean squared error; we reported Type I error and the rate of correctly rejecting the null hypothesis, given the prevalence of frequentist null hypothesis significance testing (NHST) in the applied literature

  • Statistical models tested via simulation Within our four primary DID variations, we considered three other estimation aspects: generalized linear model (GLM) link function specification, standard error estimation, and weighting to account for state population

Read more

Summary

Introduction

Reliable evaluations of state-level policies are essential for identifying effective policies and informing policymakers’ decisions. More guidance is needed about which set of statistical models perform best when estimating how state-level policies affect outcomes. Reliable evaluations of state-level policies are essential to identifying effective policies and informing policymakers’ decisions, yet the methodological rigor of published studies varies (see Schuler, et al (2020) for a review of the opioid policy literature). The choice of model specification as well as other factors – including low outcome occurrence rates (e.g., opioid mortality), sample size (both the number of policy states as well as the number of time points available), and differences across states prior to policy adoption – can impact the accuracy and precision of effect estimates. Despite the wealth of knowledge concerning challenges of and best practices for DID designs in various settings, the applied literature largely does not reflect these insights [17,18,19,20]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call