Evaluating the use of semi-structured crowdsourced data to quantify inequitable access to urban biodiversity: A case study with eBird.

Prashikdivya Gajbhiye,Deja J. Perkins,Paige S. Warren,Aaron M. Grade,Travis Longcore,Nathan W. Chan

doi:10.1371/journal.pone.0277223

Prashikdivya Gajbhiye, Deja J. Perkins

Open Access

https://doi.org/10.1371/journal.pone.0277223

Copy DOI

Abstract

Credibly estimating social-ecological relationships requires data with broad coverage and fine geographic resolutions that are not typically available from standard ecological surveys. Open and unstructured data from crowdsourced platforms offer an opportunity for collecting large quantities of user-submitted ecological data. However, the representativeness of the areas sampled by these data portals is not well known. We investigate how data availability in eBird, one of the largest and most popular crowdsourced science platforms, correlates with race and income of census tracts in two cities: Boston, MA and Phoenix, AZ. We find that checklist submissions vary greatly across census tracts, with similar patterns within both metropolitan regions. In particular, census tracts with high income and high proportions of white residents are most likely to be represented in the data in both cities, which indicates selection bias in eBird coverage. Our results illustrate the non-representativeness of eBird data, and they also raise deeper questions about the validity of statistical inferences regarding disparities that can be drawn from such datasets. We discuss these challenges and illustrate how sample selection problems in unstructured or semi-structured crowdsourced data can lead to spurious conclusions regarding the relationships between race, income, and access to urban bird biodiversity. While crowdsourced data are indispensable and complementary to more traditional approaches for collecting ecological data, we conclude that unstructured or semi-structured data may not be well-suited for all lines of inquiry, particularly those requiring consistent data coverage, and should thus be handled with appropriate care.

Full Text