Abstract

In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger worst-case performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two classes of methods in solving constrained overdetermined linear regression problems---e.g., e2 and e1 regression problems.• We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.• By rewriting the ep regression problem into a stochastic optimization problem, we connect pwSGD to several existing ep solvers including RLA methods with algorithmic leveraging (RLA for short).• We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rate is superior to other related SGD algorithms such as the weighted randomized Kaczmarz algorithm.• Particularly, when solving e1 regression with size n by d, PWSGD returns an approximate solution with e relative error on the objective value in O(log n · nnz(A) + poly(d)/e2) time. This complexity is uniformly better than that of RLA methods in terms of both e and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.• For e2 regression, pwSGD returns an approximate solution with e relative error on the objective value and solution vector in prediction norm in O(log n · nnz(A) + poly(d) log(1/e)/e) time. We show that when solving unconstrained e2 regression, this complexity is comparable to that of RLA and is asymptotically better over several state-of-the-art solvers in the regime where the desired accuracy e, high dimension n and low dimension d satisfy d ≥ 1/e and n ≥d2/e.Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that pwSGD converges to a medium-precision solution, e.g., e = 10--3, more quickly than other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call