Abstract
Geo‐referenced species occurrences from public databases have become essential to biodiversity research and conservation. However, geographical biases are widely recognized as a factor limiting the usefulness of such data for understanding species diversity and distribution. In particular, differences in sampling intensity across a landscape due to differences in human accessibility are ubiquitous but may differ in strength among taxonomic groups and data sets. Although several factors have been described to influence human access (such as presence of roads, rivers, airports and cities), quantifying their specific and combined effects on recorded occurrence data remains challenging. Here we present sampbias, an algorithm and software for quantifying the effect of accessibility biases in species occurrence data sets. sampbias uses a Bayesian approach to estimate how sampling rates vary as a function of proximity to one or multiple bias factors. The results are comparable among bias factors and data sets. We demonstrate the use of sampbias on a data set of mammal occurrences from the island of Borneo, showing a high biasing effect of cities and a moderate effect of roads and airports. sampbias is implemented as a well‐documented, open‐access and user‐friendly R package that we hope will become a standard tool for anyone working with species occurrences in ecology, evolution, conservation and related fields.
Highlights
Available data sets of geo-referenced species occurrences, such as provided by the Global Biodiversity Information Facility () have become a fundamental resource in biological sciences, especially in biogeography, conservation and macroecology
Sampling biases that may affect the recording of species occurrences include the under-sampling of specific taxa (‘taxonomic bias’, e.g. birds versus nematodes), specific geographic regions (‘geographic bias’, e.g. accessible versus remote areas) and specific temporal periods (‘temporal bias’, e.g. wet versus dry season)
We present sampbias ver. 1.0.4, a probabilistic method to quantify accessibility bias in data sets of species occurrences. sampbias is implemented as a user-friendly R-package and uses a Bayesian approach to address three questions: 1) How strong is the accessibility bias in a given data set? 2) How strong is the effect of different bias factors in causing the overall accessibility bias? 3) How is accessibility bias distributed in space?
Summary
Available data sets of geo-referenced species occurrences, such as provided by the Global Biodiversity Information Facility () have become a fundamental resource in biological sciences, especially in biogeography, conservation and macroecology These data sets are typically not collected systematically and rarely include information on collection effort. Physical accessibility by people is omnipresent as a bias factor (Kadmon et al 2004, Engemann et al 2015, Lin et al 2015), across spatial scales, as the commonly used term ‘roadside bias’ testifies This means that most species observations are made in or near cities, along roads, paths, rivers and near human settlements. It is crucial that researchers realise the intrinsic biases associated with the data they deal with, especially in cross-taxonomic studies, since occurrence data sets from different taxa are likely differently affected by sampling biases due to differences in specimen collection and transportation. The results may be used to identify priorities for further collection or digitalization efforts and to assess the reliability of scientific results based on publicly available species distribution data
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.