Distributed one-step upgraded estimation for non-uniformly and non-randomly distributed data

Feifei Wang,Yingqiu Zhu,Danyang Huang,Haobo Qi,Hansheng Wang

doi:10.1016/j.csda.2021.107265

Abstract

One-shot-type (or divide-and-conquer) estimators have been widely used for distributed statistical analysis. However, their outstanding statistical efficiency hinges on two critical conditions. The first is the uniformity condition, which requires that the sample sizes allocated to different Workers should be as comparable as possible. The second one is the randomness condition, which requires that the data should be distributed across Workers as randomly as possible. Both conditions are often violated in practice. The violation of either condition can be seriously degrade the statistical efficiency of one-shot estimators, or even make them inconsistent. To fix this problem, a novel one-step upgraded pilot (OSUP) method is proposed. In the first step of the algorithm, a pilot estimate is computed based on randomly selected samples from different Workers. In the second step, one-step updating is conducted based on the pilot estimate by summarizing the derivative information on each Worker. The resulting OSUP estimator is theoretically proved to be as statistically efficient as the whole sample maximum likelihood estimator without any restrictive assumption about distribution uniformity and randomness. Extensive numerical studies are presented to demonstrate the finite sample performance of the OSUP estimator. Finally, by way of an illustration, an American Airlines dataset is analyzed on a Spark cluster.

Full Text