Abstract
This paper focuses on the distributed learning in nonparametric regression framework. With sufficient computational resources, the efficiency of distributed algorithms improves as the number of machines increases. We aim to analyze how the number of machines affects statistical optimality. We establish an upper bound for the number of machines to achieve statistical minimax in two settings: nonparametric estimation and hypothesis testing. Our framework is general compared with existing work. We build a unified frame in distributed inference for various regression problems, including thin-plate splines and additive regression under random design: univariate, multivariate, and diverging-dimensional designs. The main tool to achieve this goal is a tight bound of an empirical process by introducing the Green function for equivalent kernels. Thorough numerical studies back theoretical findings.
Highlights
In a distributed computing environment, a common practice is to distribute a massive data set to multiple processors and aggregate local results obtained from separate machines into global counterparts
We begin by introducing some background on reproducing kernel Hilbert space (RKHS), and our nonparametric testing formulation under the distributed kernel ridge regression
Throughout we assume that f ∈ H, where H ⊂ L2π(X ) is a reproducing kernel Hilbert space (RKHS) associated with an inner product ·, · H and a reproducing kernel function R(·, ·) : X × X → R
Summary
In a distributed computing environment, a common practice is to distribute a massive data set to multiple processors and aggregate local results obtained from separate machines into global counterparts. We characterize the upper bounds of s for achieving statistical optimality based on quantifying an empirical process. In the particular smoothing spline regression example, we establish a tight bound of the empirical process by introducing the Green function for equivalent kernels, leading to a polynomial order improvement of s compared with [30]. We derive the null limit distribution of the test statistics and characterize how the number of processors s affects minimax optimality of testing. We obtain a minimax rate of testing for nonparametric additive models with a diverging number of components Such rate is crucial in obtaining the upper bound of s for optimal testing and is of independent interest.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.