Abstract
In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson’s sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.
Highlights
Since its introduction, the bootstrap [1] has become, perhaps, the most popular statistical tool for assessing uncertainty of unknown quantities in situations were analytical solutions are not available, or modeling assumptions and asymptotic approximations are invalid
The performance difference observed in these two examples suggests that the gain in speed achieved by the vectorized implementation decreases as a function of the sample size
The resampling implementation based on the “for loop” was generally faster than the implementations provided by the bootstrap and boot R packages
Summary
The bootstrap [1] has become, perhaps, the most popular statistical tool for assessing uncertainty of unknown quantities in situations were analytical solutions are not available, or modeling assumptions and asymptotic approximations are invalid. In addition to this basic non-parametric approach, a multitude of alternative bootstrap schemes have been proposed in the literature for accounting for model specific characteristics, including parametric and semi-parametric bootstrapping techniques [2, 3]. In regression models, several distinct approaches based on re-sampling of residuals have been proposed [2,3,4], in addition to the simple non-parametric bootstrap, denoted by paired or case bootstrap in this particular context
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.