Large-bodied birds are over-represented in unstructured citizen science data

Corey T Callaghan,Henrique M Pereira,Christopher J Roberts,Max Hofmann,Alistair G B Poore

doi:10.1038/s41598-021-98584-7

Corey T Callaghan, Henrique M Pereira + Show 3 more

Open Access

PDF Available

https://doi.org/10.1038/s41598-021-98584-7

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Citizen science platforms are quickly accumulating hundreds of millions of biodiversity observations around the world annually. Quantifying and correcting for the biases in citizen science datasets remains an important first step before these data are used to address ecological questions and monitor biodiversity. One source of potential bias among datasets is the difference between those citizen science programs that have unstructured protocols and those that have semi-structured or structured protocols for submitting observations. To quantify biases in an unstructured citizen science platform, we contrasted bird observations from the unstructured iNaturalist platform with that from a semi-structured citizen science platform—eBird—for the continental United States. We tested whether four traits of species (body size, commonness, flock size, and color) predicted if a species was under- or over-represented in the unstructured dataset compared with the semi-structured dataset. We found strong evidence that large-bodied birds were over-represented in the unstructured citizen science dataset; moderate evidence that common species were over-represented in the unstructured dataset; strong evidence that species in large groups were over-represented; and no evidence that colorful species were over-represented in unstructured citizen science data. Our results suggest that biases exist in unstructured citizen science data when compared with semi-structured data, likely as a result of the detectability of a species and the inherent recording process. Importantly, in programs like iNaturalist the detectability process is two-fold—first, an individual organism needs to be detected, and second, it needs to be photographed, which is likely easier for many large-bodied species. Our results indicate that caution is warranted when using unstructured citizen science data in ecological modelling, and highlight body size as a fundamental trait that can be used as a covariate for modelling opportunistic species occurrence records, representing the detectability or identifiability in unstructured citizen science datasets. Future research in this space should continue to focus on quantifying and documenting biases in citizen science data, and expand our research by including structured citizen science data to understand how biases differ among unstructured, semi-structured, and structured citizen science platforms.

Highlights

Citizen science platforms are quickly accumulating hundreds of millions of biodiversity observations around the world annually
We found that the percent of eBird checklists a species was found on and the percent of total iNaturalist observations a species comprised was strongly correlated among states (Figure S2), suggesting that species are sampled to a proportionally similar extent in unstructured and semi-structured citizen science projects
Across the 507 species included in our analyses (Table S1), we showed that larger species were more likely to be over-represented in the unstructured citizen science dataset, and this was true in most states, as illustrated by the empirical comparison (Fig. 3a)

Summary

Introduction

Citizen science platforms are quickly accumulating hundreds of millions of biodiversity observations around the world annually. Our results indicate that caution is warranted when using unstructured citizen science data in ecological modelling, and highlight body size as a fundamental trait that can be used as a covariate for modelling opportunistic species occurrence records, representing the detectability or identifiability in unstructured citizen science datasets. Future research in this space should continue to focus on quantifying and documenting biases in citizen science data, and expand our research by including structured citizen science data to understand how biases differ among unstructured, semi-structured, and structured citizen science platforms. Statistical solutions to account for known biases and noise inherent in citizen science data are used[3,35,36]

Methods

Results

Conclusion