Circumventing superefficiency: An effective strategy for distributed computing in non-standard problems

Moulinath Banerjee,Cécile Durot

doi:10.1214/19-ejs1559

Abstract

We propose a strategy for computing estimators in some non-standard M-estimation problems, where the data are distributed across different servers and the observations across servers, though independent, can come from heterogeneous sub-populations, thereby violating the identically distributed assumption. Our strategy fixes the super-efficiency phenomenon observed in prior work on distributed computing in (i) the isotonic regression framework, where averaging several isotonic estimates (each computed at a local server) on a central server produces super-efficient estimates that do not replicate the properties of the global isotonic estimator, i.e. the isotonic estimate that would be constructed by transferring all the data to a single server, and (ii) certain types of M-estimation problems involving optimization of discontinuous criterion functions where M-estimates converge at the cube-root rate. The new estimators proposed in this paper work by smoothing the data on each local server, communicating the smoothed summaries to the central server, and then solving a non-linear optimization problem at the central server. They are shown to replicate the asymptotic properties of the corresponding global estimators, and also overcome the super-efficiency phenomenon exhibited by existing estimators.

Highlights

Distributed computing has become significant in the practice of statistics as well as other branches of data science
As the literature on distributed computing is enormous, here we provide a selection of instances of research on distributed computing problems in a variety of statistical/machinelearning contexts: see, e.g. [10], [12], [26], [27], [6], [19], [24]
Our goal in this paper is to propose new estimators under the divide and conquer (DC) framework in both the monotone function estimation problem as well as in certain versions of the M-estimation setting of [20] which do not suffer from the super-efficiency problem of the pooled-by-averaging estimators and which recover the limiting properties of the corresponding global estimators

Summary

Background

Distributed computing has become significant in the practice of statistics as well as other branches of data science. They demonstrate in both problems that the maximal MSE of the pooled-by-averaging estimator over a collection of models in a neighborhood of a fixed model diverges to ∞ with N , while the maximal MSE of the global estimator remains bounded In both BDS and [20], super-efficiency results from computing the nonstandard estimator at each local machine and averaging these estimators at the central server. To avoid this undesirable phenomenon, the key idea is to reverse these steps, i.e., first average the data on each local server in an appropriate manner (which will typically depend on the structure and the dimension of the problem) to obtain essentially sufficient summary statistics which are transferred to the central server. The N pairs will be scrambled across a number of different servers (say L), with the same server hosting data from different subpopulations, as well as data from the same sub-population potentially stored on multiple servers

The new estimator for the regression function

Computational considerations

Characterization of the new estimators

Notation and assumptions

The regression function μ satisfies

Uniformly bounded MSE property of the new estimators

Asymptotic distributions

The location parameter problem

Theoretical properties of the pooled estimator

Discussion

Preparatory lemmas

Limited simulation results

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Circumventing superefficiency: An effective strategy for distributed computing in non-standard problems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic journal of statistics

Lead the way for us

Journal: Electronic journal of statistics	Publication Date: Jan 1, 2019
License type: cc-by

Similar Papers

Discovering Truths from Distributed Data
Yaqing Wang ... Jing Gao
-
Yaqing Wang, et. al.Yaqing Wang ... Jing Gao
01 Nov 2017
01 Nov 2017

A proposed architecture for context-aware mobility management
Jhoanna Rhodette Pedrasa ... Aruna Seneviratne
-
Jhoanna Rhodette Pedrasa, et. al.Jhoanna Rhodette Pedrasa ... Aruna Seneviratne
01 Oct 2007
01 Oct 2007

Threshold-based admission policies for video services
S.-H.G Chan ... F.A Tobagi
-
S.-H.G Chan, et. al.S.-H.G Chan ... F.A Tobagi
05 Dec 1999
05 Dec 1999

Divide and conquer in nonstandard problems and the super-efficiency phenomenon
Moulinath Banerjee ... Bodhisattva Sen
The Annals of Statistics | VOL. 47
Moulinath Banerjee, et. al.Moulinath Banerjee ... Bodhisattva Sen
26 Jan 2018
The Annals of Statistics | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Circumventing superefficiency: An effective strategy for distributed computing in non-standard problems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic journal of statistics