Variance reduced optimization with implicit gradient transport

Zhuang Yang

doi:10.1016/j.knosys.2020.106626

Abstract

The question of how to readily reduce variance in stochastic optimization methods is challenging. The most widely used approach of reducing variance in stochastic optimization is to employ past gradients. Although utilizing second order information in stochastic optimization has recently been proved to be an effective way to track gradients, research on taking second order information to track gradients is quite limited due to a high computational burden. In this paper, we introduce the modified form of the implicit gradient transport (MIGT) technique into the classical stochastic variance reduced methods, i.e., SVRG, which leads to a new method called SVRG–MIGT. The SVRG–MIGT method enjoys second order information to provide faster variance reduction, but without computing the Hessian explicitly. We analyze the theoretical convergence properties of the SVRG–MIGT method for strongly convex objective functions and prove that it is comparable to state-of-the-art methods. To further confirm the effectiveness of MIGT, we propose Acc-Prox-SVRG–MIGT, a novel variant of the accelerated mini-batch Prox-SVRG (Acc-Prox-SVRG) method, by introducing it into Acc-Prox-SVRG. In addition, to greatly reduce the computational complexity, we propose L-SVRG–MIGT, by introducing MIGT to loopless SVRG (L-SVRG), which uses the full gradient with a small probability. Moreover, we establish a convergence analysis for L-SVRG–MIGT. Experimental results on a wide range of data sets demonstrate the efficacy and robustness of our proposed methods.

Full Text