Abstract

Abstract Building reliable species distribution models (SDMs) from presence‐only information requires a good understanding of the spatial variation in the sampling effort. However, in most cases, the sampling effort is unknown, leading to biases in SDMs. This study proposes a method to jointly estimate the parameters of sampling effort and species densities to avoid such biases. The method is particularly suited to the analysis of massive but highly heterogeneous presence‐only data. The proposed method is based on estimating the variation in sampling effort over units of a spatial mesh in parallel with the environmental density of multiple species using a marked Poisson process model. Based on simulations with realistic settings, we examined the performance and robustness of parameter estimations. We also analysed a large‐scale citizen science dataset with highly heterogeneous sampling (Pl@ntNet), including around 300,000 occurrences of 150 plant species. We found that sampling effort was correctly estimated when the true sampling effort was constant within the cells of a spatial mesh. Estimation bias arose when sampling effort and environmental drivers strongly covaried within cells. Otherwise, the inference was correct and robust to sampling variation within cells. Running the model on real occurrences of 150 plant species provided an estimated map of relative sampling effort for 15% of French territory. We also found that the density estimated for an exotic invasive plant was consistent with prior data. This is the first method jointly estimating species densities depending on environment, and sampling effort as an explicit spatial function, from occurrence data of multiple species. An asset of the method is that a few frequently observed species greatly contribute to correctly estimate sampling effort, thereby improving density estimation of all other species. This approach can thus provide reliable SDM for large opportunistic presence‐only datasets, with broad spatial variation in sampling effort but also many species, such as datasets from citizen science programmes.

Highlights

  • Understanding biodiversity dynamics and designing conservation strategies require characterizing and analysing the distribution of species in space and time

  • We found that sampling effort was correctly estimated when the true sampling effort was constant within the cells of a spatial mesh

  • The objective of the present study is to propose a joint estimation of spatial sampling effort and species ecological niches, to alleviate biases in SDMs due to heterogeneous sampling

Read more

Summary

Introduction

Understanding biodiversity dynamics and designing conservation strategies require characterizing and analysing the distribution of species in space and time. The observed distribution of species occurrences depends on the actual species abundance and on the sampling effort of observers. Species occurrence data have become widely available from worldwide citizen science programmes or naturalist community platforms (e.g. iNaturalist, e-­Bird, Pl@ntNet, Naturgucker; see Chandler et al, 2017), in part thanks to new digital tools and smartphone applications (Teacher et al, 2013). Contributors do not follow a planned sampling protocol and submit observations of specimens that are remarkable, atypical or new to them. We uniformly drew a fixed number of points per sampling cell as described in Appendix D This avoided the problems of total uniform sampling, that is, cells with no background points. We could fit this model on a laptop with R-­glmnet (it requires about 20 Gbytes of RAM overall)

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call