Abstract

This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximation, and prove non-asymptotic deviations guarantees, as well as limit theorems, for the resulting estimators. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, and provide performance guarantees for distributed maximum likelihood estimation.

Highlights

  • According to (IBM, 2015), “Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone

  • Most previous research focused on the following question: how significantly does this loss affect the quality of statistical estimation when compared to an “oracle” that has access to the whole sample?

  • The question that we ask in this paper is different: what can be gained from randomly splitting the data across several subsamples? What are the statistical advantages of the divide-and-conquer framework? Our work indicates that one of the key benefits of an appropriate merging strategy is robustness

Read more

Summary

Introduction

According to (IBM, 2015), “Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone. Existing results for the median-based merging strategies have several pitfalls related to the deviation rates, and in most cases known guarantees are suboptimal These guarantees suggest that estimators obtained via the median-based approach are very sensitive to the choice of k, the number of partitions. The location parameters of symmetric distributions admits many robust estimators of the form (1), the sample median being a notable example This intuition allows us to establish a parallel between the non-asymptotic deviation guarantees for distributed estimation procedures of the form (1) and the degree of symmetry of “local” estimators quantified by the rates of convergence to normal approximation.

Let θj
The term
Now the claim immediately follows from
Yj and
Recall that αj
Lebesgue dominated convergence Theorem implies that
We will show that
Since ex
Noting that
Denote by
IA IA
Rm y
The second fundamental theorem of calculus yields
Findings
Median Error

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.