Polynomial Learning Rate Policy with Warm Restart for Deep Neural Network

Purnendu Mishra,Kishor Sarawadekar

doi:10.1109/tencon.2019.8929465

Abstract

Learning rate (LR) is one of the most important hyper-parameters in any deep neural network (DNN) optimization process. It controls the speed of network convergence to the point of global minima by navigation through non-convex loss surface. The performance of a DNN is affected by presence of local minima, saddle points, etc. in the loss surface. Decaying the learning rate by a factor at fixed number of epochs or exponentially is the conventional way of varying the LR. Recently, two new approaches for setting learning rate have been introduced namely cyclical learning rate and stochastic gradient descent with warm restarts. In both of these approaches, the learning rate value is varied in a cyclic pattern between two boundary values. This paper introduces another warm restart technique which is inspired by these two approaches and it uses “poly” LR policy. The proposed technique is called as polynomial learning rate with warm restart and it requires only a single warm restart. The proposed LR policy helps in faster convergence of the DNN and it has slightly higher classification accuracy. The performance of the proposed LR policy is demonstrated on CIFAR-10, CIFAR-100 and tiny ImageNet dataset with CNN, ResNets and Wide Residual Networks (WRN) architectures.

Full Text