Abstract

Most existing flexible count distributions allow only approximate inference when used in a regression context. This work proposes a new framework to provide an exact and flexible alternative for modeling and simulating count data with various types of dispersion (equi-, under-, and over-dispersion). The new method, referred to as “balanced discretization”, consists of discretizing continuous probability distributions while preserving expectations. It is easy to generate pseudo random variates from the resulting balanced discrete distribution since it has a simple stochastic representation (probabilistic rounding) in terms of the continuous distribution. For illustrative purposes, we develop the family of balanced discrete gamma distributions that can model equi-, under-, and over-dispersed count data. This family of count distributions is appropriate for building flexible count regression models because the expectation of the distribution has a simple expression in terms of the parameters of the distribution. Using the Jensen–Shannon divergence measure, we show that under the equidispersion restriction, the family of balanced discrete gamma distributions is similar to the Poisson distribution. Based on this, we conjecture that while covering all types of dispersions, a count regression model based on the balanced discrete gamma distribution will allow recovering a near Poisson distribution model fit when the data are Poisson distributed.

Highlights

  • The regression analysis of count responses mostly relies on the Poisson model

  • It appears that the one-parameter balanced discrete gamma (BDG) distribution based count regression model will be an effective parsimonious model [30] that can be fit to observed data to check the appropriateness of an equidispersion model

  • While a BDG regression model will allow exact inference in flexible count modeling, testing for latent equidispersion will allow recovering a near Poisson regression model when supported by observed data

Read more

Summary

Introduction

The regression analysis of count responses mostly relies on the Poisson model. the equidispersion (variance equals mean) assumption of the Poisson distribution makes Poisson regression inappropriate in many situations where data show overdispersion (variance greater than mean) or underdispersion (variance less than mean). Poisson [5], and extended Poisson–Tweedy regressions [6]), which makes inference approximate with quasi-models Another drawback is the lack of a simple expression for the model mean value (Conway–Maxwell–Poisson [7], double Poisson [8,9], gamma count [10], semi-nonparametric Poisson polynomial [11], and discrete Weibull [12] models). This work proposes a discretization procedure to start from continuous probability distributions and construct count models with (i) properly normalized probability mass functions for underdispersion, equidispersion, as well as overdispersion situations and (ii) simple expressions for the model mean values. The probabilistic rounding mechanism, expressed as a simple stochastic representation in terms of the continuous distribution, allows generating pseudo random variates from the resulting balanced discrete distribution.

The Balanced Discretization Method
Notations
Reminders
Motivating Example and Definition
Probability Mass and Distribution Functions
Moments and Index of Dispersion
Conditional Distributions of Latent Continuous and Binary Outcomes
Link with Mean-Preserving Discretization
The Balanced Discrete Gamma Family
The Balanced Discrete Gamma Distribution
Comparison with Some Alternatives
Balanced Discretization Versus Discrete Concentration
Distance to the Poisson Distribution under Equidispersion
Conclusions
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call