Unified Algorithm Framework for Nonconvex Stochastic Optimization in Deep Neural Networks

Yini Zhu,Hideaki Iiduka

doi:10.1109/access.2021.3120749

Yini Zhu, Hideaki Iiduka

Open Access

https://doi.org/10.1109/access.2021.3120749

Copy DOI

Abstract

This paper presents a unified algorithmic framework for nonconvex stochastic optimization, which is needed to train deep neural networks. The unified algorithm includes the existing adaptive-learning-rate optimization algorithms, such as Adaptive Moment Estimation (Adam), Adaptive Mean Square Gradient (AMSGrad), Adam with weighted gradient and dynamic bound of learning rate (GWDC), AMSGrad with weighted gradient and dynamic bound of learning rate (AMSGWDC), and Adapting stepsizes by the belief in observed gradients (AdaBelief). The paper also gives convergence analyses of the unified algorithm for constant and diminishing learning rates. When using a constant learning rate, the algorithm can approximate a stationary point of a nonconvex stochastic optimization problem. When using a diminishing rate, it converges to a stationary point of the problem. Hence, the analyses lead to the finding that the existing adaptive-learning-rate optimization algorithms can be applied to nonconvex stochastic optimization in deep neural networks in theory. Additionally, this paper provides numerical results showing that the unified algorithm can train deep neural networks in practice. Moreover, it provides numerical comparisons for unconstrained minimization using benchmark functions of the unified algorithm with certain heuristic intelligent optimization algorithms. The numerical comparisons show that a teaching-learning-based optimization algorithm and the unified algorithm perform well.

Highlights

A useful way to train deep neural networks is to solve a nonconvex optimization problem in terms of deep neural networks [1]–[3] and find suitable parameters for them
This paper presented a unification of the existing adaptivelearning-rate optimization algorithms for nonconvex stochastic optimization in deep neural networks
The first analysis showed that the algorithm approximates a stationary point of the problem when it uses a constant learning rate

Summary

INTRODUCTION

A useful way to train deep neural networks is to solve a nonconvex optimization problem in terms of deep neural networks [1]–[3] and find suitable parameters for them. In contrast to the previous results, this paper explicitly shows that the existing adaptive-learning-rate optimization algorithms can solve such problems (Subsections IV-A–IV-D). While GWDC and AMSGWDC [2] with diminishing learning rates can only be applied to convex optimization (see Table 1), the proposed algorithm (Algorithm 1). We would like to emphasize that the existing algorithms with constant learning rates can be applied to the problem (Subsection IV-E and Table 1). We would like to emphasize that Adam and AMSGrad and GWDC and AMSGWDC can be applied to the problem This is in contrast to [2], which presented a regret minimization only for GWDC and AMSGWDC with a diminishing learning rate, and [11], which presented convergence analyses only for Adam and AMSGrad for constant and diminishing learning rates.

OPTIMIZATION IN DEEP NEURAL NETWORKS

CONVERGENCE ANALYSIS OF ALGORITHM 1 WITH A CONSTANT LEARNING RATE

CONVERGENCE ANALYSIS OF ALGORITHM 1 WITH A

COMPARISON OF ADAM WITH ALGORITHM 1 IN THE

COMPARISON OF AMSGrad WITH ALGORITHM 1 IN

DISCUSSION

NUMERICAL EXPERIMENTS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Unified Algorithm Framework for Nonconvex Stochastic Optimization in Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE transactions on artificial intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE transactions on artificial intelligence | VOL. 2

A unified learning algorithm to extract principal and minor components
Dezhong Peng ... Yong Xiang
Digital signal processing | VOL. 19
Dezhong Peng, et. al.Dezhong Peng ... Yong Xiang
18 Mar 2009
Digital signal processing | VOL. 19

Training Deep Neural Networks Using Conjugate Gradient-like Methods
Hideaki Iiduka ... Yu Kobayashi
Electronics | VOL. 9
Hideaki Iiduka, et. al.Hideaki Iiduka ... Yu Kobayashi
02 Nov 2020
Electronics | VOL. 9

Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks.
Hideaki Iiduka
IEEE transactions on cybernetics | VOL. 52
Hideaki IidukaHideaki Iiduka
01 Dec 2022
IEEE transactions on cybernetics | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unified Algorithm Framework for Nonconvex Stochastic Optimization in Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions