Abstract

Opportunistically collected species occurrence data are often used for species distribution models (SDMs) when high-quality data collected through standardized recording protocols are unavailable. While opportunistic data are abundant, uncertainty is usually high, e.g. due to observer effects or a lack of metadata. To increase data quality and improve model performance, we filtered species records based on record attributes that provide information on the observation process or post-entry data validation. Data filtering does not only increase the quality of species records, it simultaneously reduces sample size, a trade-off that remains relatively unexplored. By controlling for sample size in a dataset of 255 species, we were able to explore the combined impact of data quality and sample size on model performance. We applied three data quality filters based on observers' activity, the validation status of a record in the database and the detail of a submitted record, and analyzed changes in AUC, Sensitivity and Specificity using Maxent with and without filtering. The impact of stringent filtering on model performance depended on (1) the quality of the filtered data: records validated as correct and more detailed records lead to higher model performance, (2) the proportional reduction in sample size caused by filtering and the remaining absolute sample size: filters causing small reductions that lead to sample sizes of more than 100 presences generally benefitted model performance and (3) the taxonomic group: plant and dragonfly models benefitted more from data quality filtering compared to bird and butterfly models. Our results also indicate that recommendations for quality filtering depend on the goal of the study, e.g. increasing Sensitivity and/or Specificity. Further research must identify what drives species’ sensitivity to data quality. Nonetheless, our study confirms that large quantities of volunteer generated and opportunistically collected data can make a valuable contribution to ecological research and species conservation.

Highlights

  • Appropriate conservation measures must mitigate the alarming de­ clines of biodiversity caused by global pressures such as climate change (Urban et al, 2016), invasive species (Early et al, 2016) and intensi­ fying land use (Newbold et al, 2015)

  • The impact of stringent filtering on model performance depended on (1) the quality of the filtered data: records validated as correct and more detailed records lead to higher model performance, (2) the proportional reduction in sample size caused by filtering and the remaining absolute sample size: filters causing small reductions that lead to sample sizes of more than 100 presences generally benefitted model performance and (3) the taxonomic group: plant and dragonfly models benefitted more from data quality filtering compared to bird and butterfly models

  • Model evaluation metrics were averaged across the 20 repetitions for the fixed sample sizes and we looked at the mean differences in model performance (∆ AUC, ∆ Sensitivity, ∆ Specificity) between models of an unfiltered training set and the filtered training sets

Read more

Summary

Introduction

Appropriate conservation measures must mitigate the alarming de­ clines of biodiversity caused by global pressures such as climate change (Urban et al, 2016), invasive species (Early et al, 2016) and intensi­ fying land use (Newbold et al, 2015). Choosing proper conservation measures requires evidence on the state of biodiversity and species’ distributions. Such evidence is gathered through standardised protocols, performed by trained observers and with a clear description of both data collection and project objectives (Kosmala et al, 2016). Such highly structured data, is rarely available for a wide range of species, nor extensive periods or geographical areas (Urban et al, 2016). The value of data with information on detectability or information on absences is indisputable and their applications are abundant, e.g. for species dis­ tribution models (SDMs)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.