Abstract

Abstract: Sample size requirements are common in many multivariate analysis techniques as one of the measures taken to ensure the robustness of such techniques, such requirements have not been of interest in the area of count data models. As such, this study investigated the effect of sample size on the efficiency of six commonly used count data models namely: Poisson regression model (PRM), Negative binomial regression model (NBRM), Zero-inflated Poisson (ZIP), Zero-inflated negative binomial (ZINB), Poisson Hurdle model (PHM) and Negative binomial hurdle model (NBHM). The data used in this study were sourced from Data First and were collected by Statistics South Africa through the Marriage and Divorce database. PRM, NBRM, ZIP, ZINB, PHM and NBHM were applied to ten randomly selected samples ranging from 4392 to 43916 and differing by 10% in size. The six models were compared using the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Vuong’s test for over-dispersion, McFadden RSQ, Mean Square Error (MSE) and Mean Absolute Deviation (MAD).The results revealed that generally, the Negative Binomial-based models outperformed Poisson-based models. However, the results did not reveal the effect of sample size variations on the efficiency of the models since there was no consistency in the change in AIC, BIC, Vuong’s test for over-dispersion, McFadden RSQ, MSE and MAD as the sample size increased.

Highlights

  • Count data is defined by Hilbe (2014)as observations that only take non-negative integers theoretically ranging from zero to the maximum value of the variable being modelled

  • Despite their seldom occurrence in practice, under-dispersion and zero-deflation have led to the birth of hurdle models namely: the Poisson Hurdle model (PHM) and Negative Binomial Hurdle model (NBHM) which are described in detail by Rose, Martin, Wannemuehler & Plikaytis (2006).The models considered in this study based on their popularity are: Poisson regression model (PRM), negative binomial regression model (NBRM), zero-inflated Poisson (ZIP), zeroinflated negative binomial (ZINB), PHM and NBHM

  • The study compared the efficiency of PRM, NBRM, ZIP, ZINB, PHM and NBHM under ten sample sizes (4392, 8783, 13175, 17566, 21958, 25107, 29311, 33478, 37667 and 41881) using Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), McFadden’s RSQ, Vuong’s test, Mean Square Error (MSE) and Mean Absolute Deviation (MAD)

Read more

Summary

Introduction

Count data is defined by Hilbe (2014)as observations that only take non-negative integers theoretically ranging from zero to the maximum value of the variable being modelled. Poisson regression model (PRM) is used as the basis for modelling count responses under the assumption that the conditional mean of the outcome variable is equal to the conditional variance (equi-dispersion) (Vach, 2012). Other challenges that may arise in count response modelling are under-dispersion and zero-deflation but they seldom occur in practice (Morel & Neerchal, 2012; Ozmen & Famoye, 2007) Despite their seldom occurrence in practice, under-dispersion and zero-deflation have led to the birth of hurdle models namely: the Poisson Hurdle model (PHM) and Negative Binomial Hurdle model (NBHM) which are described in detail by Rose, Martin, Wannemuehler & Plikaytis (2006).The models considered in this study based on their popularity are: PRM, NBRM, ZIP, ZINB, PHM and NBHM

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call