Weighted SGD for ℓ p Regression with Randomized Preconditioning.

Jiyan Yang,Yin-Lam Chow,Michael W Mahoney,Christopher Ré

doi:10.1137/1.9781611974331.ch41

Jiyan Yang, Yin-Lam Chow + Show 2 more

Open Access

PDF Available

https://doi.org/10.1137/1.9781611974331.ch41

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems-e.g., ℓ2 and ℓ1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.By rewriting a deterministic ℓ p regression problem as a stochastic optimization problem, we connect pwSGD to several existing ℓ p solvers including RLA methods with algorithmic leveraging (RLA for short).We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rates are superior to other related SGD algorithm such as the weighted randomized Kaczmarz algorithm.Particularly, when solving ℓ1 regression with size n by d, pwSGD returns an approximate solution with ε relative error in the objective value in 𝒪(log n·nnz(A)+poly(d)/ε2) time. This complexity is uniformly better than that of RLA methods in terms of both ε and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.For ℓ2 regression, pwSGD returns an approximate solution with ε relative error in the objective value and the solution vector measured in prediction norm in 𝒪(log n·nnz(A)+poly(d) log(1/ε)/ε) time. We show that for unconstrained ℓ2 regression, this complexity is comparable to that of RLA and is asymptotically better over several state-of-the-art solvers in the regime where the desired accuracy ε, high dimension n and low dimension d satisfy d ≥ 1/ε and n ≥ d2/ε. We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that pwSGD converges to a medium-precision solution, e.g., ε = 10-3, more quickly.

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Weighted SGD for ℓ p Regression with Randomized Preconditioning.

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms

Lead the way for us

Journal: Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms	Publication Date: Dec 21, 2015
Citations: 10

Similar Papers

Weighted SGD for ep regression with randomized preconditioning
...
-
, et. al. ...
10 Jan 2016
10 Jan 2016

Kalman-Based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning
Vivak Patel
SIAM Journal on Optimization | VOL. 26
Vivak PatelVivak Patel
01 Jan 2015
SIAM Journal on Optimization | VOL. 26

Recurrent Neural Network Language Model Training Using Natural Gradient
Jianwei Yu ... Xie Chen
-
Jianwei Yu, et. al.Jianwei Yu ... Xie Chen
01 May 2019
01 May 2019

Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
Ruijuan Chen ... Xiuting Li
Fractal and Fractional | VOL. 6
Ruijuan Chen, et. al.Ruijuan Chen ... Xiuting Li
29 Nov 2022
Fractal and Fractional | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Weighted SGD for ℓ p Regression with Randomized Preconditioning.

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: Proceedings of the ... Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Symposium on Discrete Algorithms