Exponential decay sine wave learning rate for fast deep neural network training

Wangpeng An,Haoqian Wang,Yulun Zhang,Qionghai Dai

doi:10.1109/vcip.2017.8305126

Abstract

Most state-of-the-art results on image classification tasks were obtained by residual neural networks, which use stochastic gradient descent (SGD) with momentum for training. In most cases, the learning rate drops by a constant factor every pre-defined number of epochs. However, it is difficult and time-consuming to estimate how many epochs to drop the learning rate. To tackle this problem, cyclical learning rate is gaining popularity in gradient-based optimization to improve the convergence speed in accelerated gradient schemes. But cyclical learning rate scheme scans a broad range of learning rate, some of which are not suitable for deep neural network training. In this paper, we propose a simple yet effective exponential decay sine wave like learning rate technique for SGD to improve its convergence speed. In the training process, the learning rate would vary in sine wave way. While the maximum value of sine wave would decay exponentially along with training epochs. An ensemble of wide residual nets with our proposed learning scheme achieves 3.01% and 16.03% errors on CIFAR-10 and CIFAR-100 respectively. Furthermore, our proposed method uses far less number of epochs than most recent learning rate strategies, accelerating neural network training tremendously.

Full Text