Abstract

We propose a communication-efficient distributed learning algorithm for high-dimensional sparse linear regression models in the scenario that the data are stored across multiple machines. Our approach is a distributed version of the SDAR [Huang J, Jiao Y, Liu Y, et al. A constructive approach to l0 penalized regression. J Mach Learn Res. 2018;19(1):403–439] method for solving the KKT system of the regularized least squares. At each step of the proposed method, the reduced least squares are solved by the steepest descent method, which only needs to calculate the gradient vectors on each node machine and communicate them instead of the data. We refer to this as SD-SDAR for brevity. Under some regular conditions, we obtain the sharp and error bounds for the solution sequences generated by SD-SDAR algorithm. We investigate the computational complexity and show that the number of rounds of communications are bounded by and , respectively, where J is the number of important predictors, R is the relative magnitude of the non-zero target coefficients, N is the total sample size and p is the dimension of covariates. Simulation studies illustrate that SD-SDAR outperforms some existing distributed methods in accuracy, efficiency and support recovery.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call