Abstract

The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inflated regression models in the functions hurdle() and zeroinfl() from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inflated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences-better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice.

Highlights

  • Modeling count variables is a common task in economics and the social sciences

  • The classical Poisson regression model for count data is often of limited use in these disciplines because empirical count data sets typically exhibit over-dispersion and/or an excess number of zeros. The former issue can be addressed by extending the plain Poisson regression model in various directions: e.g., using sandwich covariances or estimating an additional dispersion parameter

  • The classical Poisson, geometric and negative binomial models are described in a generalized linear model (GLM) framework; they are implemented in R by the glm() function (Chambers and Hastie 1992) in the stats package and the glm.nb() function in the MASS package (Venables and Ripley 2002)

Read more

Summary

Introduction

Modeling count variables is a common task in economics and the social sciences. The classical Poisson regression model for count data is often of limited use in these disciplines because empirical count data sets typically exhibit over-dispersion and/or an excess number of zeros. We discuss the implementation of hurdle and zero-inflated models in the functions hurdle() and zeroinfl() in the pscl package (Jackman 2008), available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=pscl. The design of both modeling functions as well as the methods operating on the associated fitted model objects follows that of the base R functionality so that the new software integrates into the computational toolbox for modeling count data in R.

Method ML quasi adjusted
Generalized linear models
Hurdle models
Zero-inflated models
Application and illustrations
Demand for medical care by the elderly
Poisson regression
Quasi-Poisson regression
Negative binomial regression
Hurdle regression
Zero-inflated regression
Comparison
Summary
Technical details for hurdle models
Technical details for zero-inflated models
Methods for fitted zero-inflated and hurdle models
Replication of textbook results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call