How many bootstrap replications are necessary for estimating remote sensing-assisted, model-based standard errors?

Ronald E Mcroberts,Erik Næsset,Zhengyang Hou,Göran Ståhl,Svetlana Saarela,Jessica Esteban,Davide Travaglini,Jahangir Mohammadi,Gherardo Chirici

doi:10.1016/j.rse.2023.113455

Abstract

When probability samples are not available, the model-based framework may be the only option for constructing inferences in the form of prediction intervals for population means. Further, for machine learning and some non-parametric and nonlinear regression prediction techniques, resampling methods such as the bootstrap may be the only option for obtaining the standard errors necessary for constructing those prediction intervals.All bootstrap approaches entail repeatedly sampling from the original sample, estimating the parameter of interest for each replication, and estimating the standard error of the estimate of the parameter as the standard deviation of the bootstrap estimates over replications. The objective of the study was to develop a procedure for terminating resampling such that the resulting number of replications assures, at least in probability, that the estimate of the standard error stabilizes to the standard error corresponding to one million replications. The analyses used a variety of datasets: five forest inventory datasets with either volume or aboveground biomass as the dependent variable and metrics from either airborne laser scanning or Landsat as independent variables, three from Europe, one from Southwest Asia, and one from Africa; and two forest/non-forest versus Landsat datasets, one from Minnesota and one from Wisconsin, both in the USA. The primary contribution of the study was development and demonstration of a procedure that specifies criteria for terminating resampling that assure in probability that the bootstrap estimate of the standard error stabilizes to the estimate obtained with one million replications.

Full Text