Nonparametric distributed learning under general designs

Meimei Liu,Guang Cheng,Zuofeng Shang

doi:10.1214/20-ejs1733

Abstract

This paper focuses on the distributed learning in nonparametric regression framework. With sufficient computational resources, the efficiency of distributed algorithms improves as the number of machines increases. We aim to analyze how the number of machines affects statistical optimality. We establish an upper bound for the number of machines to achieve statistical minimax in two settings: nonparametric estimation and hypothesis testing. Our framework is general compared with existing work. We build a unified frame in distributed inference for various regression problems, including thin-plate splines and additive regression under random design: univariate, multivariate, and diverging-dimensional designs. The main tool to achieve this goal is a tight bound of an empirical process by introducing the Green function for equivalent kernels. Thorough numerical studies back theoretical findings.

Highlights

In a distributed computing environment, a common practice is to distribute a massive data set to multiple processors and aggregate local results obtained from separate machines into global counterparts
We begin by introducing some background on reproducing kernel Hilbert space (RKHS), and our nonparametric testing formulation under the distributed kernel ridge regression
Throughout we assume that f ∈ H, where H ⊂ L2π(X ) is a reproducing kernel Hilbert space (RKHS) associated with an inner product ·, · H and a reproducing kernel function R(·, ·) : X × X → R

Summary

Introduction

In a distributed computing environment, a common practice is to distribute a massive data set to multiple processors and aggregate local results obtained from separate machines into global counterparts. We characterize the upper bounds of s for achieving statistical optimality based on quantifying an empirical process. In the particular smoothing spline regression example, we establish a tight bound of the empirical process by introducing the Green function for equivalent kernels, leading to a polynomial order improvement of s compared with [30]. We derive the null limit distribution of the test statistics and characterize how the number of processors s affects minimax optimality of testing. We obtain a minimax rate of testing for nonparametric additive models with a diverging number of components Such rate is crucial in obtaining the upper bound of s for optimal testing and is of independent interest.

Nonparametric regression in reproducing kernel Hilbert spaces

Distributed kernel ridge regression

Assumptions

Minimax optimal estimation

Minimax optimal testing

N iKXi

Examples

Example 1

Example 2

Example 3

Example 4

Smoothing spline regression

Nonparametric additive regression

Conclusion

Some preliminary results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2020
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Nonparametric distributed learning under general designs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Remarks on Non-Parametric Estimates for Density Functions and Regression Curves
É A Nadaraya
Theory of Probability & Its Applications | VOL. 15
É A NadarayaÉ A Nadaraya
01 Jan 1970
Theory of Probability & Its Applications | VOL. 15

Fast Function to Function Regression
...
-
, et. al. ...
01 Jan 2015
01 Jan 2015

Online Hypothesis Testing and Non-Parametric Model Estimation Based on Correlated Observations
Sima Sobhiyeh ... Mort Naraghi-Pour
-
Sima Sobhiyeh, et. al.Sima Sobhiyeh ... Mort Naraghi-Pour
01 Dec 2018
01 Dec 2018

Union support recovery in high-dimensional multivariate regression
Guillaume Obozinski ... Martin J Wainwright
-
Guillaume Obozinski, et. al.Guillaume Obozinski ... Martin J Wainwright
01 Sep 2008
01 Sep 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nonparametric distributed learning under general designs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics