Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data

Valerie A Steen,Morgan W Tingley,Peter W C Paton,Chris S Elphick

doi:10.1111/2041-210x.13525

Abstract

Abstract Spatial biases are a common feature of presence–absence data from citizen scientists. Spatial thinning can mitigate errors in species distribution models (SDMs) that use these data. When detections or non‐detections are rare, however, SDMs may suffer from class imbalance or low sample size of the minority (i.e. rarer) class. Poor predictions can result, the severity of which may vary by modelling technique. To explore the consequences of spatial bias and class imbalance in presence–absence data, we used eBird citizen science data for 102 bird species from the northeastern USA to compare spatial thinning, class balancing and majority‐only thinning (i.e. retaining all samples of the minority class). We created SDMs using two parametric or semi‐parametric techniques (generalized linear models and generalized additive models) and two machine learning techniques (random forest and boosted regression trees). We tested the predictive abilities of these SDMs using an independent and systematically collected reference dataset with a combination of discrimination (area under the receiver operator characteristic curve; true skill statistic; area under the precision‐recall curve) and calibration (Brier score; Cohen's kappa) metrics. We found large variation in SDM performance depending on thinning and balancing decisions. Across all species, there was no single best approach, with the optimal choice of thinning and/or balancing depending on modelling technique, performance metric and the baseline sample prevalence of species in the data. Spatially thinning all the data was often a poor approach, especially for species with baseline sample prevalence <0.1. For most of these rare species, balancing classes improved model discrimination between presence and absence classes using machine learning techniques, but typically hindered model calibration. Baseline sample prevalence, sample size, modelling approach and the intended application of SDM output—whether discrimination or calibration—should guide decisions about how to thin or balance data, given the considerable influence of these methodological choices on SDM performance. For prognostic applications requiring good model calibration (vis‐à‐vis discrimination), the match between sample prevalence and true species prevalence may be the overriding feature and warrants further investigation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data

Abstract

Talk to us

Similar Papers

More From: Methods in Ecology and Evolution

Lead the way for us

Journal: Methods in Ecology and Evolution	Publication Date: Dec 2, 2020
Citations: 56

Similar Papers

Leaving the area under the receiving operating characteristic curve behind: An evaluation method for species distribution modelling applications based on presence‐only data
Laura Jiménez ... Jorge Soberón
Methods in Ecology and Evolution | VOL. 11
Laura Jiménez, et. al.Laura Jiménez ... Jorge Soberón
13 Oct 2020
Methods in Ecology and Evolution | VOL. 11

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models
Tianxiao Hao ... Jane Elith
Ecography | VOL. 43
Tianxiao Hao, et. al.Tianxiao Hao ... Jane Elith
27 Jan 2020
Ecography | VOL. 43

Interannual climate variability improves niche estimates for ectothermic but not endothermic species
Dirk Nikolaus Karger ... Niklaus E Zimmermann
Scientific Reports | VOL. 13
Dirk Nikolaus Karger, et. al.Dirk Nikolaus Karger ... Niklaus E Zimmermann
02 Aug 2023
Scientific Reports | VOL. 13

Performance metrics and variance partitioning reveal sources of uncertainty in species distribution models
James I Watling ... Carolina Speroterra
Ecological Modelling | VOL. 309-310
James I Watling, et. al.James I Watling ... Carolina Speroterra
15 May 2015
Ecological Modelling | VOL. 309-310

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data

Abstract

Talk to us

Similar Papers

More From: Methods in Ecology and Evolution