Abstract

The learning rate is the most crucial hyper-parameter of a neural network that has a significant impact on its performance. In this article, a novel learning rate setting idea termed randomness distribution learning rate (RDLR) is presented to regulate the learning rate value. The proposed RDLR shifts the learning rate from deterministic to random and sets the value based on the state of the network. The RDLR uses the distance between neurons rather than the covariance matrix to get the redundancy of the network, as well as the Monte Carlo method, and to simplify the neuron to a point to reduce calculation costs. The proposed algorithms do not regulate the learning rate value of each epoch but rather the mathematical expectation and distribution of the learning rate during the training process. The neural network can jump out of the local minimum or unstable area using our algorithms and obtain the minimum point of the area in gradient space. The RDLR algorithms reduce the impact of tiny changes in learning rate value and streamline the tuning process of neural networks. The RDLR saves calculation costs and can work independently or cooperate with the traditional algorithms. In conjunction with traditional learning rate algorithms, the RDLR can set the same learning rate strategy for all layers in a neural network or keep the same mathematical expectation of the learning rate of each layer while adjusting their impulse. The experiments show that the RDLR can improve the performance of a neural network while keeping other hyper-parameters not changed. It is a novel method for adjusting the training process by dynamically changing the random distribution of the learning rate. Our algorithm can monitor the state of the neural network and keep injecting randomness into the neural network training based on the redundancy of the neurons. Furthermore, our algorithm does not require any additional hyper-parameters. The experiments show that our RDLR can improve the performance of multiple structure neural networks in various tasks when applied to a variety of loss functions and data augment methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call