Abstract

Abstract Integrated distribution models (IDMs) predict where species might occur using data from multiple sources, a technique thought to be especially useful when data from any individual source are scarce. Recent advances allow us to fit such models with latent terms to account for dependence within and between data sources, but they are computationally challenging to fit. We propose a fast new methodology for fitting integrated distribution models using presence/absence and presence‐only data, via a spatial random effects approach combined with automatic differentiation. We have written an R package (called scampr) for straightforward implementation of our approach. We use simulation to demonstrate that our approach has comparable performance to INLA—a common framework for fitting IDMs—but with computation times up to an order of magnitude faster. We also use simulation to look at when IDMs can be expected to outperform models fitted to a single data source, and find that the amount of benefit gained from using an IDM is a function of the relative amount of additional information available from incorporating a second data source into the model. We apply our method to predict 29 plant species in NSW, Australia, and find particular benefit in predictive performance when data from a single source are scarce and when compared to models for presence‐only data. Our faster methods of fitting IDMs make it feasible to more deeply explore the model space (e.g. comparing different ways to model latent terms), and in future work, to consider extensions to more complex models, for example the multi‐species setting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call