A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution.

Xavier A Harrison

doi:10.7717/peerj.1114

Xavier A Harrison

Open Access

PDF Available

https://doi.org/10.7717/peerj.1114

Copy DOI

Export

Save

Cite

Journal: PeerJ	Publication Date: Jul 21, 2015
Citations: 319	License type: cc-by

Affiliation: Zoological Society of London

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Overdispersion is a common feature of models of biological data, but researchers often fail to model the excess variation driving the overdispersion, resulting in biased parameter estimates and standard errors. Quantifying and modeling overdispersion when it is present is therefore critical for robust biological inference. One means to account for overdispersion is to add an observation-level random effect (OLRE) to a model, where each data point receives a unique level of a random effect that can absorb the extra-parametric variation in the data. Although some studies have investigated the utility of OLRE to model overdispersion in Poisson count data, studies doing so for Binomial proportion data are scarce. Here I use a simulation approach to investigate the ability of both OLRE models and Beta-Binomial models to recover unbiased parameter estimates in mixed effects models of Binomial data under various degrees of overdispersion. In addition, as ecologists often fit random intercept terms to models when the random effect sample size is low (<5 levels), I investigate the performance of both model types under a range of random effect sample sizes when overdispersion is present. Simulation results revealed that the efficacy of OLRE depends on the process that generated the overdispersion; OLRE failed to cope with overdispersion generated from a Beta-Binomial mixture model, leading to biased slope and intercept estimates, but performed well for overdispersion generated by adding random noise to the linear predictor. Comparison of parameter estimates from an OLRE model with those from its corresponding Beta-Binomial model readily identified when OLRE were performing poorly due to disagreement between effect sizes, and this strategy should be employed whenever OLRE are used for Binomial data to assess their reliability. Beta-Binomial models performed well across all contexts, but showed a tendency to underestimate effect sizes when modelling non-Beta-Binomial data. Finally, both OLRE and Beta-Binomial models performed poorly when models contained <5 levels of the random intercept term, especially for estimating variance components, and this effect appeared independent of total sample size. These results suggest that OLRE are a useful tool for modelling overdispersion in Binomial data, but that they do not perform well in all circumstances and researchers should take care to verify the robustness of parameter estimates of OLRE models.

Highlights

Binomial data are frequently encountered in the fields of ecology and evolution
The overdispersed Binomial/observation-level random effect (OLRE) model did not suffer the same bias, the standard error of all estimates increased in tandem with overdispersion
Summary: OLRE are highly sensitive to the mechanism generating the overdispersion in the data, yielding large bias when applied to Beta-Binomial data

Summary

Introduction

Researchers often wish to know what factors determine the proportion of offspring sired by a focal individual (Tyler et al, 2013), the proportion of eggs of a clutch that successfully hatch (Harrison et al, 2013a), or the prevalence of disease in a population (Bielby et al, 2014). To determine which factors drive variation in the proportion data of interest, researchers often fit Binomial models to their data and model the Binomial mean as a function of covariates. Failing to deal with overdispersion can lead to biased parameter estimates and standard errors in these models (Hilbe, 2011; Harrison, 2014), potentially leading to false conclusions regarding which covariates are truly influential on the outcome variable. It is crucial that we find robust means to deal with overdispersion in order to correctly identify the biological processes underlying our observed Binomial data

Methods

Results

Discussion

Conclusion