Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients

Inas Abuqaddom,Basel A Mahafzah,Hossam Faris

doi:10.1016/j.knosys.2021.107391

Abstract

Deep multi-layer neural networks represent hypotheses of very high degree polynomials to solve very complex problems. Gradient descent optimization algorithms are utilized to train such deep networks through backpropagation, which suffers from permanent problems such as the vanishing gradient problem. To overcome the vanishing problem, we introduce a new anti-vanishing back-propagated learning algorithm called oriented stochastic loss descent (OSLD). OSLD updates a random-initialized parameter iteratively in the opposite direction of its partial derivative sign by a small positive random number, which is scaled by a tuned ratio of the model loss. This paper compares OSLD to stochastic gradient descent algorithm as the basic backpropagation algorithm and Adam as one of the best backpropagation algorithms in five benchmark models. Experimental results show that OSLD is very competitive to Adam in small and moderate depth models, and OSLD outperforms Adam in very long models. Moreover, OSLD is compatible with current backpropagation architectures except for learning rates. Finally, OSLD is stable and opens more choices in front of the very deep multi-layer neural networks.

Full Text