Effects of sample size on accuracy of species distribution models

David R.B Stockwell,A.Townsend Peterson

doi:10.1016/s0304-3800(01)00388-x

Abstract

Given increasing access to large amounts of biodiversity information, a powerful capability is that of modeling ecological niches and predicting geographic distributions. Because, sampling species’ distributions is costly, we explored sample size needs for accurate modeling for three predictive modeling methods via re-sampling of data for well-sampled species, and developed curves of model improvement with increasing sample size. In general, under a coarse surrogate model, and machine-learning methods, average success rate at predicting occurrence of a species at a location, or accuracy, was 90% of maximum within ten sample points, and was near maximal at 50 data points. However, a fine surrogate model and logistic regression model had significantly lower rates of increase in accuracy with increasing sample size, reaching similar maximum accuracy at 100 data points. The choice of environmental variables also produced unpredictable effects on accuracy over the range of sample sizes on the logistic regression method, while the machine-learning method had robust performance throughout. Examining correlates of model performance across species, extent of geographic distribution was the only significant ecological factor.

Full Text