An optimization Strategy for Deep Neural Networks Training

Tingting Wu,Peng Zeng,Chunhe Song

doi:10.1109/icicml57342.2022.10009665

Abstract

Learning rate is one of the essential hyperparameters influencing the training process and the accuracy of deep neural networks. However, until now, it is challenging to determine an optimal learning rate. A large learning rate can accelerate the training process but may bring instability in the training process and miss the global optimum. In contrast, a small learning rate would be in a stable training process, but the training speed would be slow, and the training process is easy to fall into local optimum. In this paper, first, the impact of the learning rate is analyzed. It is found that a learning rate schedule should consist of two stages to take into account the speed and accuracy of the training process simultaneously. Based on this consideration, this paper proposes an improvement strategy of learning rate schedules: a two-stage integration strategy of a large fixed learning rate and a rapid decay learning rate. Second, the proposed strategy is applied to optimize a series of widely used learning rate settings. Extensive experiments on CIFAR-10 and CIFAR-100 datasets with VGG19, ResNets, ResNext, DenseNets, SeNet, and some other models demonstrate that the proposed strategy optimizes the learning rate settings, the performance of trained models is enhanced.

Full Text