Abstract

This paper presents a unified algorithmic framework for nonconvex stochastic optimization, which is needed to train deep neural networks. The unified algorithm includes the existing adaptive-learning-rate optimization algorithms, such as Adaptive Moment Estimation (Adam), Adaptive Mean Square Gradient (AMSGrad), Adam with weighted gradient and dynamic bound of learning rate (GWDC), AMSGrad with weighted gradient and dynamic bound of learning rate (AMSGWDC), and Adapting stepsizes by the belief in observed gradients (AdaBelief). The paper also gives convergence analyses of the unified algorithm for constant and diminishing learning rates. When using a constant learning rate, the algorithm can approximate a stationary point of a nonconvex stochastic optimization problem. When using a diminishing rate, it converges to a stationary point of the problem. Hence, the analyses lead to the finding that the existing adaptive-learning-rate optimization algorithms can be applied to nonconvex stochastic optimization in deep neural networks in theory. Additionally, this paper provides numerical results showing that the unified algorithm can train deep neural networks in practice. Moreover, it provides numerical comparisons for unconstrained minimization using benchmark functions of the unified algorithm with certain heuristic intelligent optimization algorithms. The numerical comparisons show that a teaching-learning-based optimization algorithm and the unified algorithm perform well.

Highlights

  • A useful way to train deep neural networks is to solve a nonconvex optimization problem in terms of deep neural networks [1]–[3] and find suitable parameters for them

  • This paper presented a unification of the existing adaptivelearning-rate optimization algorithms for nonconvex stochastic optimization in deep neural networks

  • The first analysis showed that the algorithm approximates a stationary point of the problem when it uses a constant learning rate

Read more

Summary

INTRODUCTION

A useful way to train deep neural networks is to solve a nonconvex optimization problem in terms of deep neural networks [1]–[3] and find suitable parameters for them. In contrast to the previous results, this paper explicitly shows that the existing adaptive-learning-rate optimization algorithms can solve such problems (Subsections IV-A–IV-D). While GWDC and AMSGWDC [2] with diminishing learning rates can only be applied to convex optimization (see Table 1), the proposed algorithm (Algorithm 1). We would like to emphasize that the existing algorithms with constant learning rates can be applied to the problem (Subsection IV-E and Table 1). We would like to emphasize that Adam and AMSGrad and GWDC and AMSGWDC can be applied to the problem This is in contrast to [2], which presented a regret minimization only for GWDC and AMSGWDC with a diminishing learning rate, and [11], which presented convergence analyses only for Adam and AMSGrad for constant and diminishing learning rates.

OPTIMIZATION IN DEEP NEURAL NETWORKS
CONVERGENCE ANALYSIS OF ALGORITHM 1 WITH A CONSTANT LEARNING RATE
CONVERGENCE ANALYSIS OF ALGORITHM 1 WITH A
COMPARISON OF ADAM WITH ALGORITHM 1 IN THE
COMPARISON OF AMSGrad WITH ALGORITHM 1 IN
DISCUSSION
NUMERICAL EXPERIMENTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call