A compensation-based optimization strategy for top dense layer training

Xiexing Feng,Q.M Jonathan Wu,Yimin Yang,Libo Cao

doi:10.1016/j.neucom.2020.07.127

Abstract

The stochastic gradient descent (SGD) method plays a central role in training deep convolutional neural networks (DCNNs). The recent advances in the field of optimization methods for DCNNs follow the direction of gradients. The innovations mainly lie in adopting different techniques to manage the history of gradients or automatically adapt the step size. In contrast, in this paper we propose a novel optimization approach for training the top dense layer of DCNN. It primarily utilizes the orientation directly pointing to the optimal values of parameters instead of the direction of gradients to navigate the parameters updates. The Moore–Penrose inverse method has been adopted to determine the difference between the current parameters and the optimal parameters, and the parameter updates are driven along this direction to compensate such a difference. Subsequently, the parameters have been fine-tuned along the direction of the classical gradient. Experiments have been conducted on extensively selected benchmark datasets. The results indicate that the proposed approach obtains a higher convergence rate and lower minimum loss compared to other state-of-the-art optimization methods. Furthermore, with the same DCNN architectures, the performance improvement margin between the proposed optimization method and other state-of-the-art optimization approaches is highly significant.

Full Text