Abstract
Data preconditioning technique, which reduces the condition number of the problem by a linear transformation of the data matrix, is typically used to accelerate the convergence of the first-order optimization methods for regularized loss minimization. One obvious limitation of the technique is exceedingly expensive of computational cost for the large-scale problems, especially an ocean of samples. In this paper, we have a gradient preconditioning trick and combine it with mini-batch SGD. The proposed gradient preconditioned mini-batch SGD algorithm boosts indeed the convergence with lower computational cost than that of the data preconditioning technique for ridge regression. Concretely, we use recent random projection and linear sketching methods to randomly low rank approximate the data matrix, then we can achieve a appropriate preconditioner through numerical linear algebra. Finally, we apply obtained preconditioner to the gradient to reduce computational cost. The experimental results on both synthetic data and real data sets validate the feasibility and effectiveness of our trick and algorithm.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.