Leader population learning rate schedule

Jia Wei,Xingjun Zhang,Zhimin Zhuo,Zeyu Ji,Zheng Wei,Jingbo Li,Qianyang Li

doi:10.1016/j.ins.2022.12.039

Abstract

The successful application of modern deep neural network models is deeply influenced by the choice of hyperparameters. As one of the most important hyperparameters, the Learning Rate (LR) needs to be fine-tuned. In order to train models that can perform better, finding a suitable LR schedule quickly has become a pressing problem. Current research often relies on manual use of parallel search methods such as grid search, or serial methods such as Bayesian. However, the variety and number of models and datasets result in a very large LR space, and these methods consume significant resources to find only sub-optimal static LR schedules within a given time frame. It was found that in the currently widely used Distributed Data Parallel (DDP) deep learning, where multiple nodes themselves constitute the population, it is possible to use population algorithms to optimise the searched LR schedule dynamically throughout the training process with little additional training time. To maximise the potential of both the searched LR schedule and the population optimisation of the participating multi-nodes, this paper proposes a Leader Population Learning Rate Schedule (LPLRS) for the DDP deep learning environment, which continuously explores for better learning rates in the neighbourhood of the searched LR schedule and guides the subsequent training process. LPLRS achieves higher model classification accuracy on the Cifar 10&100 test datasets on the state-of-the-art Wide Residual Network with Sharpness-Aware Minimization (WRN(SAM)) model compared to the latest SGDR, CLR, and StepLR learning rate schedules.

Full Text