SVRG meets AdaGrad: painless variance reduction

Benjamin Dubois-Taine,Simon Lacoste-Julien,Sharan Vaswani,Mark Schmidt,Reza Babanezhad

doi:10.1007/s10994-022-06265-x

Abstract

Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more-robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad, a common adaptive gradient method, in the inner loop of SVRG, making it robust to the choice of step-size. When minimizing a sum of n smooth convex functions, we prove that a variant of AdaSVRG requires \(\tilde{O}(n + 1/\epsilon )\) gradient evaluations to achieve an \(O(\epsilon )\)-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. Next, we show that the dynamics of AdaGrad exhibit a two-phase behavior – the step-size remains approximately constant in the first phase, and then decreases at a \(O\left( {1}/{\sqrt{t}}\right)\) rate. This result maybe of independent interest, and allows us to propose a heuristic that adaptively determines the length of each inner-loop in AdaSVRG. Via experiments on synthetic and real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over standard and other “tune-free” VR methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SVRG meets AdaGrad: painless variance reduction

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Journal: Machine Learning	Publication Date: Nov 10, 2022
Citations: 4

Similar Papers

Effectiveness of Radiation Transport Variance Reduction Methods for Wide-Area Environmental Contamination Assay Applications
E Asano ... S Dewji
Nuclear Science and Engineering | VOL. 198
E Asano, et. al.E Asano ... S Dewji
09 Feb 2024
Nuclear Science and Engineering | VOL. 198

Reliability assessment of test embankments on soft Bangkok clay by variance reduction and nearest-neighbor methods
D.T Bergado ... M Danzuka
Computers and Geotechnics | VOL. 4
D.T Bergado, et. al.D.T Bergado ... M Danzuka
01 Jan 1987
Computers and Geotechnics | VOL. 4

Development of new variance reduction methods based on weight window technique in RMC code
Xiao Fan ... Kan Wang
Progress in Nuclear Energy | VOL. 90
Xiao Fan, et. al.Xiao Fan ... Kan Wang
06 Apr 2016
Progress in Nuclear Energy | VOL. 90

Variance reduction and nonnormality
W Keith Hastings
Biometrika | VOL. 61
W Keith HastingsW Keith Hastings
01 Jan 1974
Biometrika | VOL. 61

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SVRG meets AdaGrad: painless variance reduction

Abstract

Talk to us

Similar Papers

More From: Machine Learning