Abstract

Model evaluation metrics play a critical role in the selection of adequate species distribution models for conservation and for any application of species distribution modelling (SDM) in general. The responses of these metrics to modelling conditions, however, are rarely taken into account. This leads to inadequate model selection, downstream analyses and uniformed decisions. To aid modellers in critically assessing modelling conditions when choosing and interpreting model evaluation metrics, we analysed the responses of the True Skill Statistic (TSS) under a variety of presence-background modelling conditions using purely theoretical scenarios. We then compared these responses with those of two evaluation metrics commonly applied in the field of meteorology which have potential for use in SDM: the Odds Ratio Skill Score (ORSS) and the Symmetric Extremal Dependence Index (SEDI). We demonstrate that (1) large cell number totals in the confusion matrix, which is strongly biased towards ‘true’ absences in presence-background SDM and (2) low prevalence both compromise model evaluation with TSS. This is since (1) TSS fails to differentiate useful from random models at extreme prevalence levels if the confusion matrix cell number total exceeds ~30,000 cells and (2) TSS converges to hit rate (sensitivity) when prevalence is lower than ~2.5%. We conclude that SEDI is optimal for most presence-background SDM initiatives. Further, ORSS may provide a better alternative if absence data are available or if equal error weighting is strictly required.

Highlights

  • Species Distribution Modelling (SDM) relates independent environmental variables to species occurrence data and, in turn, predicts a dependent variable such as probability or the relative likelihood of occurrence (Guisan and Zimmermann 2000; Peterson 2001; Guillera-Arroita et al 2015)

  • We have shown that True Skill Statistic (TSS), Odds Ratio Skill Score (ORSS) and Symmetric Extremal Dependence Index (SEDI), as well as their underlying evaluation measures (H and F, see F in Table 2), show distinct responses to: 1) increasing size of the study area and, growing numbers of background points, even when prevalence is kept constant, 2) to the direction of bias as prevalence decreases and the extent of the study area and cell number totals increase and 3) to changes in bias as prevalence decreases and the extent of the study area and cell number totals increase

  • We focused on the importance of model evaluation in the context of ecology and conservation

Read more

Summary

Introduction

Species Distribution Modelling (SDM) relates independent environmental variables to species occurrence data and, in turn, predicts a dependent variable such as probability or the relative likelihood of occurrence (Guisan and Zimmermann 2000; Peterson 2001; Guillera-Arroita et al 2015). Even though SDM predictions mostly range from zero to one, SDM predictions are often discretised into binary presence-absence maps (i.e. comprising only zeros and ones) used to evaluate wildlife management options, to identify appropriate conservation translocation sites and to evaluate model performance (Willis et al 2009; Fordham et al 2012; Liu et al 2013) with confusion matrix-based performance metrics. ‘Observed false absences’, on the other hand, are artefactual in nature, resulting from insufficient monitoring relative to species movement (Tyre et al 2003) or imperfect detection (MacKenzie et al 2002) Whereas both true and false absences can lead to ‘zero-inflated’ datasets (Heilbron 1994) that violate statistical assumptions, the latter are a source of uncertainty in parameter estimates as artefactual signals (e.g. sampling bias, probability of detection) confounding estimates of probability of occurrence (MacKenzie et al 2002)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call