Abstract
Abstract The effect of explanatory environmental variables on a species' distribution is often assessed using a count regression model. Poisson generalized linear models or negative binomial models are common, but the traditional approach of modelling the mean after log or square root transformation remains popular and in some cases is even advocated. We propose a novel framework of linear models for count data. Similar to the traditional approach, the new models apply a transformation to count responses; however, this transformation is estimated from the data and not defined a priori. In contrast to simple least‐squares fitting and in line with Poisson or negative binomial models, the exact discrete likelihood is optimized for parameter estimation and inference. Simple interpretation of effects in the linear predictors is possible. Count transformation models provide a new approach to regressing count data in a distribution‐free yet fully parametric fashion, obviating the need to a priori commit to a specific parametric family of distributions or to a specific transformation. The models are a generalization of discrete Weibull models for counts and are thus able to handle over‐ and underdispersion. We demonstrate empirically that the models are more flexible than Poisson or negative binomial models but still maintain interpretability of multiplicative effects. A re‐analysis of deer–vehicle collisions and the results of artificial simulation experiments provide evidence of the practical applicability of the model framework. In ecology studies, uncertainties regarding whether and how to transform count data can be resolved in the framework of count transformation models, which were designed to simultaneously estimate an appropriate transformation and the linear effects of environmental variables by maximizing the exact count log‐likelihood. The application of data‐driven transformations allows over‐ and underdispersion to be addressed in a model‐based approach. Models in this class can be compared to Poisson or negative binomial models using the in‐ or out‐of‐sample log‐likelihood. Extensions to nonlinear additive or interaction effects, correlated observations, hurdle‐type models and other more complex situations are possible. A free software implementation is available in the cotram add‐on package to the R system for statistical computing.
Highlights
Information represented by counts is ubiquitous in ecology
We develop the novel model starting with a generalized linear model (GLM) for a binary event (Y ≤ k) defined by some cut-off point k
In our re-analysis, we explore the estimates and properties of count regression models explaining how the risk of roe deer–vehicle collisions varies over days as well as across weeks, seasons and the whole year
Summary
Information represented by counts is ubiquitous in ecology. Perhaps the most obvious instance of ecological count data is animal abundances, which are determined either directly, for example by birdwatchers, or indirectly, by the counting of surrogates, for example the number of deer–vehicle collisions as a proxy for roe deer abundance. It is clear that the normal assumption log (Y + 1)∣ x ∼ N(α + x⊤ , σ2) is incorrect (the count data are still discrete after transformation) and, that the wrong likelihood is maximized by applying least-squares to log(y + 1) for parameter estimation and inference, this approach is still broadly used both in practice and in theory (e.g. De Felipe, Sáez-Gómez, & Camacho, 2019; Dean, Voss, & Draguljić, 2017; Gotelli & Ellison, 2013; Ives, 2015; Mooney et al, 2016). As a compromise between the two extremes of using rather strict count distribution models (such as the Poisson or negative binomial) and the analysis of transformed counts by normal linear regression models, we suggest a novel class of transformation models for count data that combine the strengths of both approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.