On the Unstable Convergence Regime of Gradient Descent

Shuo Chen,Jiaying Peng,Yao Zhao,Xiaolong Li

doi:10.1609/aaai.v38i10.29017

Abstract

Traditional gradient descent (GD) has been fully investigated for convex or L-smoothness functions, and it is widely utilized in current neural network optimization. The classical descent lemma ensures that for a function with L-smoothness, the GD trajectory converges stably towards the minimum when the learning rate is below 2 / L. This convergence is marked by a consistent reduction in the loss function throughout the iterations. However, recent experimental studies have demonstrated that even when the L-smoothness condition is not met, or if the learning rate is increased leading to oscillations in the loss function during iterations, the GD trajectory still exhibits convergence over the long run. This phenomenon is referred to as the unstable convergence regime of GD. In this paper, we present a theoretical perspective to offer a qualitative analysis of this phenomenon. The unstable convergence is in fact an inherent property of GD for general twice differentiable functions. Specifically, the forwardinvariance of GD is established, i.e., it ensures that any point within a local region will always remain within this region under GD iteration. Then, based on the forward-invariance, for the initialization outside an open set containing the local minimum, the loss function will oscillate at the first several iterations and then become monotonely decreasing after the GD trajectory jumped into the open set. This work theoretically clarifies the unstable convergence phenomenon of GD discussed in previous experimental works. The unstable convergence of GD mainly depends on the selection of the initialization, and it is actually inevitable due to the complex nature of loss function.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Unstable Convergence Regime of Gradient Descent

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

A New Gradient-Based Search Method: Grey-Gradient Search Method
Chin-Ming Hong ... Chih-Ming Chen
-
Chin-Ming Hong, et. al.Chin-Ming Hong ... Chih-Ming Chen
01 Jan 1998
01 Jan 1998

T-SNE Based on Fixed Memory Step Gradient Descent Method
Yulu Sun ... Shuoxian Zhu
-
Yulu Sun, et. al.Yulu Sun ... Shuoxian Zhu
12 Mar 2021
12 Mar 2021

Noise Reduction in Images: Some Recent Edge-Preserving Methods
...
-
, et. al. ...
01 Jan 1998
01 Jan 1998

Gradient-Based Empirical Risk Minimization Using Local Polynomial Regression
Ali Jadbabaie ... Devavrat Shah
Stochastic Systems | VOL. -
Ali Jadbabaie, et. al.Ali Jadbabaie ... Devavrat Shah
26 Mar 2024
Stochastic Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Unstable Convergence Regime of Gradient Descent

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence