Abstract

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Highlights

  • In public health and epidemiology research, count data with a large proportion of zeros are often encountered

  • We evaluate the performance of zero inflated negative binomial model (ZINB) and Hurdle negative binomial (HNB) models when the data are simulated from a HNB model

  • Our simulation results demonstrate that when the data contains zero-deflated data points as depicted in the left panel of Fig. 1, the ZINB model performs poorly as compared with the counterpart HNB model, yielding a higher Akaike information criterion (AIC) and significant difference in model fits according to the Vuong’s test (Fig. 5)

Read more

Summary

Introduction

In public health and epidemiology research, count data with a large proportion of zeros are often encountered. Feng Journal of Statistical Distributions and Applications (2021) 8:8 to model zero-inflation when the regular count models such as Poisson or negative binomial are unrealistic Both types of models have gained increasing popularities in many fields including public health services research (Neelon et al 2010; Neelon et al 2013; Neelon et al 2016), substance abuse (DeSantis and Bandyopadhyay 2011; Buu et al 2012), occupational injury (Yau and Lee 2001), medicine (Bohning et al 1999; Rose et al 2006), psychology (Atkins and Gallop 2007), public health (Yau and Lee 2001; Yau et al 2003; YB 2002; Sharker et al 2020), ecological and environmental studies (Agarwal et al 2002; Rathbun and Fei 2006; Feng and Dean 2012; Feng 2020).

Statistical models
Generating processes for excessive zeros versus sampling zeros
Relative fit measures
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call