Two-Phase Switching Optimization Strategy in Deep Neural Networks.

Hong Hui Tan,King Hann Lim

doi:10.1109/tnnls.2020.3027750

Abstract

Optimization in a deep neural network is always challenging due to the vanishing gradient problem and intensive fine-tuning of network hyperparameters. Inspired by multistage decision control systems, the stochastic diagonal approximate greatest descent (SDAGD) algorithm is proposed in this article to seek for optimal learning weights using a two-phase switching optimization strategy. The proposed optimizer controls the relative step length derived based on the long-term optimal trajectory and adopts the diagonal approximated Hessian for efficient weight update. In Phase-I, it computes the greatest step length at the boundary of each local spherical search region and, subsequently, descends rapidly toward the direction of an optimal solution. In Phase-II, it switches to an approximate Newton method automatically once it is closer to the optimal solution to achieve fast convergence. The experiments show that SDAGD produces steeper learning curves and achieves lower misclassification rates compared with other optimization techniques. Implementation of the proposed optimizer to deeper networks is also investigated in this article to study the vanishing gradient problem.

Full Text