PACL: Piecewise Arc Cotangent Decay Learning Rate for Deep Neural Network Training

Haixu Yang,Hongwei Sun,Henggui Zhang,Jihong Liu

doi:10.1109/access.2020.3002884

Abstract

Deep neural networks (DNNs) are currently the best-performing method for many classification problems. For training DNNs, the learning rate is the most important hyper-parameter, choice of which affects the performance of the model greatly. In recent years, some learning rate schedulers, such as HTD, CLR, and SGDR, have been proposed. These methods, some of which make use of the cycling mechanism to improve the convergence speed and accuracy of DNN, but performance degradation occurs in the convergence process. Others have good accuracy, but their convergence speed is too slow. This paper proposed a new learning rate schedule called piecewise arc cotangent decay learning rate (PACL), which can not only improve the convergence speed and accuracy of DNN but also significantly reduce performance degradation zone caused by the cycling mechanism. It is easy to implement, but almost at no extra computing expense. Finally, we demonstrate the effectiveness of PACL, on training CIFAR-10, CIFAR-100, and Tiny ImageNet with ResNet, DenseNet, WRN, SEResNet, and MobileNet.

Highlights

Deep learning is an active field of machine learning
piecewise arc cotangent decay learning rate (PACL) combines the advantages of piecewise decay and cyclical learning rate (CLR), adopts the mechanism of the cyclic learning rate, and the learning rate piecewise decay in each cycle
Compared with other learning rate schedulers with circular mechanisms, PACL significantly reduces the performance degradation zone caused by the cycling mechanism

Summary

INTRODUCTION

Deep learning is an active field of machine learning. Its purpose is to establish a special deep neural network (DNN) [1]. Compared with SGDR and CLR, PACL has a larger proportion of small learning rates, as such better accuracy and a more stable system can be achieved. It almost doesn’t need extra computing expenses. The scheduler has the features of a warm restart, initializing the learning rate for every some epochs or iterations It decays the learning rate with piecewise arc cotangent function, and has a smaller proportion of large learning rates and decays the learning rate rapidly in each cycle. 2. Some learning rate schedulers with cycling mechanisms have a large performance degradation zone in the convergence process.

RELATED WORKS AND MOTIVATIONS

LEARNING RATE SCHEDULERS

EXPERIMENTAL AND ANALYSIS

Findings

CONCLUSION