Abstract

Determine the optimal method of modeling a zero-inflated outcome by comparing generalized linear models (GLMs) that vary based on the distribution (negative binomial and Tweedie) and the inclusion of an offset. Participants of the 2016 EU5 (France, Germany, Italy, Spain, and the United Kingdom) administration of the National Health and Wellness Survey who self-reported cardiovascular disease (CVD; n=3,685) were compared to those without CVD (controls; n=76,915) on costs derived from counts of hospitalizations, emergency room, and primary care provider (PCP) visits occurring in the preceding six months. Four different GLMs were fit for each outcome; negative binomial and Tweedie models with and without using an offset. The negative binomial is widely used, but the Tweedie distribution is a reasonable option because it allows for more flexible modeling of zeros and extreme values. Using an offset allows for the modeling of self-reported counts directly. Fit indices (lower scores are better) included the Akaike information criterion (AIC), mean absolute error (MAE), and root mean square error (RMSE). GLM parameters comparing CVD and control groups on the aforementioned outcomes were also reviewed to determine if modeling options affected statistical significance. GLMs utilizing offsets outperformed models without them for all cost outcomes (average improvement of 210,638, €83, and €23,149 for AIC, MAE, and RMSE, respectively). Among those utilizing offsets, Tweedie outperformed on MAE and RMSE (average improvement of €253 and €70,209, respectively) while the negative binomial models had a slightly lower AIC (average improvement of 4,231). Additionally, Tweedie model parameter estimates had smaller confidence intervals and detected a significant effect of CVD on PCP visit costs (p<.05). GLMs with a Tweedie distribution and offsets are the preferred choice because they demonstrated better fit and impacted substantive interpretation of model parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call