Abstract

The area under the curve (AUC) of the receiver operator characteristic (ROC) graph is regarded as an objective measure of the discrimination accuracy of predictive models. AUC scores calculated from background values, or pseudo-absences, have been proposed as a method of model selection for species distribution models (SDMs) fitted to presence-only data. However, the utility of AUC as a measure of model performance when data on confirmed absence are unavailable has not been fully investigated. We fitted SDMs using informative climatic variables for 2000 species of Mesoamerican trees. As a reference, we also built ‘pseudo-models’ using Gaussian random fields with no biological meaning. AUC correctly selected SDMs fitted to single environmental variables over ‘pseudo-models’ fitted to single random fields in almost all cases. However, when all seven variables were included in the models, AUC erroneously selected complex pseudo-models over complex climate models in 17% of the cases. The spatial distribution patterns predicted by the pseudo-models differed from the results derived from climate-based models, even when overall AUC scores were similar. Both model and pseudo-model AUC values increased when presence points were few and spatially aggregated. The results show that AUC calculated from presence-only data can be an unreliable guide for model selection. Pseudo-absences have ill-defined properties that challenge the interpretation of AUC values. Inference on multidimensional niche spaces should not be supported by AUC values calculated using pseudo-absences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call