A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

Aixiang Andy Chen,Qingliang Chen,Bingchuan Chen,Xiaolong Chai,Rui Bian

doi:10.1109/ijcnn.2018.8489564

Abstract

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent gradient variance. To attack the problem, mini-batch SGD was proposed to get a trade-off in terms of convergence rate and iteration cost. In this paper, a general CVI (ConvergenceVariance Inequality) equation is presented to state formally the interaction of convergence rate and gradient variance. Then a novel algorithm named SSAG (Stochastic Stratified Average Gradient) is introduced to reduce gradient variance based on two techniques, stratified sampling and averaging over iterations that is a key idea in SAG (Stochastic Average Gradient). Furthermore, SSAG can achieve linear convergence rate of O((1 - μ/8C L) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sup> ) at smaller storage and iterative costs, where C ≥ 2 is the category number of training data. This convergence rate depends mainly on the variance between classes, but not on the variance within the classes. In the case of C ≪ N (N is the training data size), SSAG's convergence rate is much better than SAG's convergence rate of O((1 - μ/8N L) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sup> ). Our experimental results show SSAG outperforms SAG and many other algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Minimizing finite sums with the stochastic average gradient
Mark Schmidt ... Francis Bach
Mathematical Programming | VOL. 162
Mark Schmidt, et. al.Mark Schmidt ... Francis Bach
14 Jun 2016
Mathematical Programming | VOL. 162

Compositional Stochastic Average Gradient for Machine Learning and Related Applications
Tsung-Yu Hsieh ... Yiwei Sun
-
Tsung-Yu Hsieh, et. al.Tsung-Yu Hsieh ... Yiwei Sun
01 Jan 2018
01 Jan 2018

Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
Robert M Gower ... Francis Bach
Mathematical Programming | VOL. 188
Robert M Gower, et. al.Robert M Gower ... Francis Bach
12 May 2020
Mathematical Programming | VOL. 188

How to Train Your SSAG in Convolutional Network
Kaiyuan Fang ... Aixiang Chen
-
Kaiyuan Fang, et. al.Kaiyuan Fang ... Aixiang Chen
03 Oct 2022
03 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

Abstract

Talk to us

Similar Papers