Strengthening Gradient Descent by Sequential Motion Optimization for Deep Neural Networks

Thang Le-Duc,Jaehong Lee,Quoc-Hung Nguyen,H Nguyen-Xuan

doi:10.1109/tevc.2022.3171052

Abstract

In this paper, we explore the advantages of heuristic mechanisms and devise a new optimization framework named Sequential Motion Optimization (SMO) to strengthen gradientbased methods. The key idea of SMO is inspired from a movement mechanism in a recent metaheuristic method called Balancing Composite Motion Optimization (BCMO). Specifically, SMO establishes a sequential motion chain of two gradientguided individuals including a leader and a follower to enhance the effectiveness of parameter updates in each iteration. A surrogate gradient model with low computation cost is theoretically established to estimate the gradient of the follower by that of the leader through chain rule during training process. Experimental results in terms of training quality on both fully-connected multilayer perceptrons (MLPs) and convolutional neural networks (CNNs) with respect to three popular benchmark datasets including MNIST, Fashion-MNIST and CIFAR-10 demonstrate the superior performance of the proposed framework in comparison with the vanilla stochastic gradient descent (SGD) implemented via back-propagation (BP) algorithm. Although this study only introduces the vanilla gradient descent (GD) as a main gradientguided factor in SMO for deep neural networks (DNNs) training application, it is greatly potential to combine with other gradientbased variants to improve its effectiveness and solve other largescale optimization problems in practice.

Full Text