Distributed statistical estimation and rates of convergence in normal approximation

Stanislav Minsker

doi:10.1214/19-ejs1647

Abstract

This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximation, and prove non-asymptotic deviations guarantees, as well as limit theorems, for the resulting estimators. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, and provide performance guarantees for distributed maximum likelihood estimation.

Highlights

According to (IBM, 2015), “Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone
Most previous research focused on the following question: how significantly does this loss affect the quality of statistical estimation when compared to an “oracle” that has access to the whole sample?
The question that we ask in this paper is different: what can be gained from randomly splitting the data across several subsamples? What are the statistical advantages of the divide-and-conquer framework? Our work indicates that one of the key benefits of an appropriate merging strategy is robustness

Summary

Introduction

According to (IBM, 2015), “Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone. Existing results for the median-based merging strategies have several pitfalls related to the deviation rates, and in most cases known guarantees are suboptimal These guarantees suggest that estimators obtained via the median-based approach are very sensitive to the choice of k, the number of partitions. The location parameters of symmetric distributions admits many robust estimators of the form (1), the sample median being a notable example This intuition allows us to establish a parallel between the non-asymptotic deviation guarantees for distributed estimation procedures of the form (1) and the degree of symmetry of “local” estimators quantified by the rates of convergence to normal approximation.

Let θj

The term

Now the claim immediately follows from

Yj and

Recall that αj

Lebesgue dominated convergence Theorem implies that

We will show that

Since ex

Noting that

Denote by

IA IA

Rm y

The second fundamental theorem of calculus yields

Findings

Median Error

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2019
Citations: 40	License type: cc-by

R Discovery Prime

R Discovery Prime

Distributed statistical estimation and rates of convergence in normal approximation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Some Limit Theorems for Large Deviations
S V Nagaev
Theory of Probability & Its Applications | VOL. 10
S V NagaevS V Nagaev
01 Jan 1964
Theory of Probability & Its Applications | VOL. 10

On the Chebyshev-Cramér Asymptotic Expansions
I A Ibragimov
Theory of Probability & Its Applications | VOL. 12
I A IbragimovI A Ibragimov
01 Jan 1967
Theory of Probability & Its Applications | VOL. 12

Estimate of the Accuracy of Normal Approximation in Hilbert Space
B A Zalesskii
Theory of Probability & Its Applications | VOL. 27
B A ZalesskiiB A Zalesskii
01 Jan 1982
Theory of Probability & Its Applications | VOL. 27

A Non-Uniform Estimate for the Convergence Speed in the Multi-Dimensional Central Theorem
V I Rotar’
Theory of Probability & Its Applications | VOL. 15
V I Rotar’V I Rotar’
01 Jan 1970
Theory of Probability & Its Applications | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed statistical estimation and rates of convergence in normal approximation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics