Fast and robust bootstrap in analysing large multivariate datasets

Shahab Basiri,Visa Koivunen,Esa Ollila

doi:10.1109/acssc.2014.7094385

Abstract

In this paper we address the problem of performing statistical inference for large scale data sets. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single node. We propose a scalable, statistically robust and computationally efficient bootstrap method compatible with distributed processing and storage systems. Bootstrapping is performed on multiple smaller distinct subsets of data similarly to the bag of little bootstrap method (BLB) [1]. For each bootstrap replica drawn from distinct data subsets, a computationally efficient fixed-point estimation equation is solved. The proposed bootstrap method facilitates using highly robust statistical methods in analyzing large scale data sets. Significant savings in computation is achieved since the method does not require recomputing the estimator for each bootstrap sample but it is done analytically using a smart approximation. Simulation examples demonstrate the usefulness and validity of the method for bootstrap analysis of large data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast and robust bootstrap in analysing large multivariate datasets

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Robust, Scalable, and Fast Bootstrap Method for Analyzing Large Scale Data
Shahab Basiri ... Esa Ollila
IEEE Transactions on Signal Processing | VOL. 64
Shahab Basiri, et. al.Shahab Basiri ... Esa Ollila
12 Apr 2015
IEEE Transactions on Signal Processing | VOL. 64

Phylogenetic Analyses of Large Data Sets: Approaches Using the Angiosperms
Douglas E. Soltis ... Pamela S. Soltis
-
Douglas E. Soltis, et. al.Douglas E. Soltis ... Pamela S. Soltis
01 Jan 1999
01 Jan 1999

Secondary Data Analysis of Large Data Sets in Urology: Successes and Errors to Avoid
Bruce J Schlomer ... Hillary L Copp
Journal of Urology | VOL. 191
Bruce J Schlomer, et. al.Bruce J Schlomer ... Hillary L Copp
17 Oct 2013
Journal of Urology | VOL. 191

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast and robust bootstrap in analysing large multivariate datasets

Abstract

Talk to us

Similar Papers