Abstract

Count and proportion data may present overdispersion, i.e., greater variability than expected by the Poisson and binomial models, respectively. Different extended generalized linear models that allow for overdispersion may be used to analyze this type of data, such as models that use a generalized variance function, random-effects models, zero-inflated models and compound distribution models. Assessing goodness-of-fit and verifying assumptions of these models is not an easy task and the use of half-normal plots with a simulated envelope is a possible solution for this problem. These plots are a useful indicator of goodness-of-fit that may be used with any generalized linear model and extensions. For GLIM users, functions that generated these plots were widely used, however, in the open-source software R, these functions were not yet available on the Comprehensive R Archive Network (CRAN). We describe a new package in R, hnp, that may be used to generate the half-normal plot with a simulated envelope for residuals from different types of models. The function hnp() can be used together with a range of different model fitting packages in R that extend the basic generalized linear model fitting in glm() and is written so that it is relatively easy to extend it to new model classes and different diagnostics. We illustrate its use on a range of examples, including continuous and discrete responses, and show how it can be used to inform model selection and diagnose overdispersion.

Highlights

  • An important step of statistical modeling of any sort is to perform diagnostic analyses to assess goodness-of-fit

  • When fitting linear models under the normality assumption, goodness-of-fit can be checked using formal tests, such as the Shapiro-Wilk test for residual normality (Shapiro and Wilk 1965), or the Bartlett test for variance homogeneity (Bartlett 1937). These tests may fail under many circumstances, such as small sample sizes, and usually graphical techniques provide a better assessment for model goodness-of-fit

  • We developed the R (R Core Team 2017) package hnp (Moral, Hinde, and Demétrio 2017) that provides functions for generating half-normal plots with a simulated envelope for a range of generalized linear models and extensions

Read more

Summary

Introduction

An important step of statistical modeling of any sort is to perform diagnostic analyses to assess goodness-of-fit. When fitting linear models under the normality assumption, goodness-of-fit can be checked using formal tests, such as the Shapiro-Wilk test for residual normality (Shapiro and Wilk 1965), or the Bartlett test for variance homogeneity (Bartlett 1937) These tests may fail under many circumstances, such as small sample sizes, and usually graphical techniques provide a better assessment for model goodness-of-fit. The purpose is not to provide a region for acceptance or rejection of observations but to serve as a guide of what to expect under a well-fitted model These plots are useful for detecting possible outliers, overdispersion, and if the link function and/or error distribution were properly specified (Demétrio, Hinde, and Moral 2014). When fitting generalized linear models, or different types of extended models (e.g., zeroinflated models and mixed models), half-normal plots with simulated envelopes are useful to assess goodness-of-fit, especially when analyzing overdispersed data. Package hnp is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=hnp

Generalized linear models and overdispersion
Half-normal plots with simulated envelopes
Implemented model classes
Simulation procedures
New class implementation
Examples
Overdispersed proportion data
Overdispersed count data
Implementing new model classes
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call