Abstract

Frequent problems in applied research preventing the application of the classical Poisson log-linear model for analyzing count data include overdispersion, an excess of zeros compared to the Poisson distribution, correlated responses, as well as complex predictor structures comprising nonlinear effects of continuous covariates, interactions or spatial effects. We propose a general class of Bayesian generalized additive models for zero-inflated and overdispersed count data within the framework of generalized additive models for location, scale, and shape where semiparametric predictors can be specified for several parameters of a count data distribution. As standard options for applied work we consider the zero-inflated Poisson, the negative binomial and the zero-inflated negative binomial distribution. The additive predictor specifications rely on basis function approximations for the different types of effects in combination with Gaussian smoothness priors. We develop Bayesian inference based on Markov chain Monte Carlo simulation techniques where suitable proposal densities are constructed based on iteratively weighted least squares approximations to the full conditionals. To ensure practicability of the inference, we consider theoretical properties like the involved question whether the joint posterior is proper. The proposed approach is evaluated in simulation studies and applied to count data arising from patent citations and claim frequencies in car insurances. For the comparison of models with respect to the distribution, we consider quantile residuals as an effective graphical device and scoring rules that allow us to quantify the predictive ability of the models. The deviance information criterion is used to select appropriate predictor specifications once a response distribution has been chosen. Supplementary materials for this article are available online.

Highlights

  • For analyzing count data responses with regression models, the log-linear Poisson model embedded in the exponential family regression framework provided by generalized linear or generalized additive models is still the standard approach

  • For exponential family regression with similar predictor types, this question has been investigated for example in Fahrmeir and Kneib [2009] or Sun et al [2001] and we will generalize these results to the GAMLSS framework

  • We developed numerically efficient, Bayesian zero-inflated and overdispersed count data regression with semiparametric predictors as special cases of GAMLSS relying on iteratively weighted least squares proposals

Read more

Summary

Introduction

For analyzing count data responses with regression models, the log-linear Poisson model embedded in the exponential family regression framework provided by generalized linear or generalized additive models is still the standard approach. For Poisson regression and negative binomial regression with fixed scale parameter and no overdispersion, generalized additive models as developed in Hastie and Tibshirani [1990] and popularized by Wood [2006] provide a convenient framework that allows to overcome the linearity assumptions of generalized linear models when smooth effects of continuous covariates shall be combined in an additive predictor. The approach supports the full flexibility of structured additive regression for specifying additive predictors for all parameters of the response distribution including the success probability of the binary process and the scale parameter of the negative binomial distribution It considerably extends the set of available predictor specifications for all parameters involved in zero-inflated and overdispersed count data regression.

Observation Models
Semiparametric Predictors
Prior Specifications
Special Cases
Inference
IWLS Proposals
Metropolis-Hastings Algorithm for Zero-Inflated Count Data Regression
Multilevel Framework
Theoretical Results & Numerical Details
Simulations
Additive Models
Geoadditive Models
Application
Summary and Conclusions
A A backfitting algorithm
Computation of the Working Weights
Positive Definiteness of the Working Weights
C Propriety of the Posterior Distribution
Negative Binomial Regression
Simulation Setup
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.