Abstract

We propose a communication-efficient distributed learning algorithm for high-dimensional sparse linear regression models in the scenario that the data are stored across multiple machines. Our approach is a distributed version of the SDAR [Huang J, Jiao Y, Liu Y, et al. A constructive approach to l0 penalized regression. J Mach Learn Res. 2018;19(1):403–439] method for solving the KKT system of the regularized least squares. At each step of the proposed method, the reduced least squares are solved by the steepest descent method, which only needs to calculate the gradient vectors on each node machine and communicate them instead of the data. We refer to this as SD-SDAR for brevity. Under some regular conditions, we obtain the sharp and error bounds for the solution sequences generated by SD-SDAR algorithm. We investigate the computational complexity and show that the number of rounds of communications are bounded by and , respectively, where J is the number of important predictors, R is the relative magnitude of the non-zero target coefficients, N is the total sample size and p is the dimension of covariates. Simulation studies illustrate that SD-SDAR outperforms some existing distributed methods in accuracy, efficiency and support recovery.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.