Abstract

Methods that predict the distributions of species and habitats by developing statistical relationships between observed occurrences and environmental gradients have become common tools in environmental research, resource management, and conservation. The uptake of model predictions in practical applications remains limited, however, because validation against independent sample data is rarely practical, especially at larger spatial scales and in poorly sampled environments. Here, we use a quantitative dataset of benthic invertebrate faunal distributions from seabed photographic surveys of an important fisheries area in New Zealand as independent data against which to assess the usefulness of 47 habitat suitability models from eight published studies in the region. When assessed against the independent data, model performance was lower than in published cross-validation values, a trend of increasing performance over time seen in published metrics was not supported, and while 74% of the models were potentially useful for predicting presence or absence, correlations with prevalence and density were weak. We investigate the reasons underlying these results, using recently proposed standards to identify areas in which improvements can best be made. We conclude that commonly used cross-validation methods can yield inflated values of prediction success even when spatial structure in the input data is allowed for, and that the main impediments to prediction success are likely to include unquantified uncertainty in available predictor variables, lack of some ecologically important variables, lack of confirmed absence data for most taxa, and modeling at coarse taxonomic resolution.

Highlights

  • Understanding and managing ecosystem effects of human activities, such as bottom-contact fishing and mineral extraction in the deep sea, requires quantitative information on the distributions of benthic habitats and fauna (Kaiser et al, 2016; Pitcher et al, 2017)

  • We have used independent data from seabed photographic surveys to explore the general utility of habitat suitability models that we have developed over more than 10 years with the aim of predicting distributions of seafloor taxa in the southwest Pacific, centered on New Zealand

  • The key results of our assessment are that (1) measured model performance was lower when assessed against independent data than by k-fold cross-validation for all but two of 47 models; (2) a trend of increasing model performance with time, which is seen in published cross-validation (AUCkcv) values and is anticipated when the methods used in these studies are judged against objective criteria, is not supported when the models are tested against independent data; (3) for approximately 72% of the models, predicted probability of suitable habitat in the models was significantly higher at sites where a taxon was present in the independent data than where it was absent; and (4) correlation strengths between predicted probability of presence and observed taxon prevalence and density were weak

Read more

Summary

Introduction

Understanding and managing ecosystem effects of human activities, such as bottom-contact fishing and mineral extraction in the deep sea (depths greater than ca. 200 m), requires quantitative information on the distributions of benthic habitats and fauna (Kaiser et al, 2016; Pitcher et al, 2017). The relative paucity, patchiness, and taxonomic selectivity of available faunal sample data in the deep sea, a lack of spatial resolution and local validation of environmental layers, and limited understanding of biotic interactions and historical factors that might influence present distributions, in combination, can result in high levels of uncertainty being associated with the outputs from habitat suitability models (Fielding and Bell, 1997; Araujo and Guisan, 2006; Vierod et al, 2014; Reiss et al, 2015; Anderson et al, 2016a) This uncertainty is exacerbated by the cross-validation methods commonly used to evaluate model performance, in which subsets of the input taxon occurrence data are withheld from the model and used as test sites to assess predictions. While this approach is practical, it can generate overly optimistic values of model performance (Bahn and Mcgill, 2013; Ploton et al, 2020) that may not be supported by field validation (Anderson et al, 2016a) because the data used in crossvalidation methods are not independent from those used to build the model itself and are likely to be spatially biased (e.g., Lobo et al, 2008; Ploton et al, 2020)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call