The sample mean $\overline{X}$ is probably the most popular estimator of the expected value in all sciences and $\mathrm{var}(\overline{X})$ measures the error (standard- and mean-square-errors). Here, an alternative approach to estimation of $\mathrm{var}(\overline{X})$ for time series data is presented. The method has an accuracy similar to dependent bootstrapping, but scales in $O(n)$ time, and applies to stationary time series, including stationary Markov chains. The computational complexity is bounded by $12n$ floating point operations, but this can be reduced to $n+O(1)$ in large computations. Convergence in relative error squared is faster than ${n}^{\ensuremath{-}1/2}$ and the method is insensitive to the probability distribution of the observations. It is proven that a small part of the correlation structure is relevant to the convergence rate of the method. From this, proof of the Blocking method [Flyvbjerg and Petersen, J. Chem. Phys. 91, 461 (1989)] follows as a corollary. The result is also used to propose a hypothesis test surveying the relevant part of the correlation structure. It yields a fully automatic method which is sufficiently robust to operate without supervision. An algorithm and sample code showing the implementation is available for Python, C++, and R [www.github.com/computative/block]. Method validation using autoregressive AR(1) and AR(2) processes and physics applications is included. Method self-evaluation is provided by bias and mean-square-error statistics. The method is easily adapted to multithread applications and data larger than computing cluster memory, such as ultralong time series or data streams. This way, the paper provides a stringent and modern treatment of the Blocking method using rigorous linear algebra, multivariate probability theory, real analysis, and Fisherian statistical inference.
Read full abstract