Marginalized mixture models for count data from multiple source populations

Habtamu K Benecha,Brian Neelon,John S Preisser,Kimon Divaris

doi:10.1186/s40488-017-0057-4

Abstract

Mixture distributions provide flexibility in modeling data collected from populations having unexplained heterogeneity. While interpretations of regression parameters from traditional finite mixture models are specific to unobserved subpopulations or latent classes, investigators are often interested in making inferences about the marginal mean of a count variable in the overall population. Recently, marginal mean regression modeling procedures for zero-inflated count outcomes have been introduced within the framework of maximum likelihood estimation of zero-inflated Poisson and negative binomial regression models. In this article, we propose marginalized mixture regression models based on two-component mixtures of non-degenerate count data distributions that provide directly interpretable estimates of exposure effects on the overall population mean of a count outcome. The models are examined using simulations and applied to two datasets, one from a double-blind dental caries incidence trial, and the other from a horticultural experiment. The finite sample performance of the proposed models are compared with each other and with marginalized zero-inflated count models, as well as ordinary Poisson and negative binomial regression.

Highlights

The analysis of data from populations with unexplained heterogeneity presents special challenges to researchers
In dental caries research and many other areas, proportions of observations with zero counts are often higher than expected under the Poisson or negative binomial distributions, and regression models based on these distributions may result in biased estimates and poor predictions
Zero-inflated Poisson and negative binomial models Traditional zero-inflated models assume that counts arise from a two-component mixture of a standard count distribution with a distribution degenerate at zero

Summary

Introduction

The analysis of data from populations with unexplained heterogeneity presents special challenges to researchers. Marginalized ZIP and ZINB models To estimate the overall effects of covariates on the population mean, marginalized zeroinflated Poisson (Long et al 2014) and marginalized zero-inflated negative binomial (Preisser et al 2016) models specify parameters for the probability of being an excess zero (i.e., πi) and the marginal mean νi = E(yi) = (1 − πi)μi as logit(πi) = wiγ and log(vi) = xiβ, where β =

Results

Conclusion