Abstract

By applying concepts from the statistical physics of learning, we study layered neural networks of rectified linear units (ReLU). The comparison with conventional, sigmoidal activation functions is in the center of interest. We compute typical learning curves for large shallow networks with K hidden units in matching student teacher scenarios. The systems undergo phase transitions, i.e. sudden changes of the generalization performance via the process of hidden unit specialization at critical sizes of the training set. Surprisingly, our results show that the training behavior of ReLU networks is qualitatively different from that of networks with sigmoidal activations. In networks with K≥3 sigmoidal hidden units, the transition is discontinuous: Specialized network configurations co-exist and compete with states of poor performance even for very large training sets. On the contrary, the use of ReLU activations results in continuous transitions for all K. For large enough training sets, two competing, differently specialized states display similar generalization abilities, which coincide exactly for large hidden layers in the limit K→∞. Our findings are also confirmed in Monte Carlo simulations of the training processes.

Highlights

  • The re-gained interest in artificial neural networks [1,2,3,4,5] is largely due to the successful application of so-called Deep Learning in a number of practical contexts, see e.g. [6,7,8] for reviews and further references

  • The choice of limiting values 0 and 2 for small and large arguments, respectively, is arbitrary and irrelevant for the qualitative results of our analyses. (b) Rectified Linear Unit (ReLU) activation This simple, piece-wise linear transfer function has attracted considerable attention in the context of multi-layered neural networks: g (x) max {0, x}

  • We have investigated the training of shallow, layered neural networks in student teacher scenarios of matching complexity

Read more

Summary

Introduction

The re-gained interest in artificial neural networks [1,2,3,4,5] is largely due to the successful application of so-called Deep Learning in a number of practical contexts, see e.g. [6,7,8] for reviews and further references.The successful training of powerful, multi-layered deep networks has become feasible for a number of reasons including the automated acquisition of large amounts of training data in various domains, the use of modified and optimized architectures, e.g. convolutional networks for image processing, and the ever-increasing availability of computational power needed for the implementation of efficient training.One important modification of earlier models is the use of alternative activation functions [6,9,10]. The re-gained interest in artificial neural networks [1,2,3,4,5] is largely due to the successful application of so-called Deep Learning in a number of practical contexts, see e.g. The successful training of powerful, multi-layered deep networks has become feasible for a number of reasons including the automated acquisition of large amounts of training data in various domains, the use of modified and optimized architectures, e.g. convolutional networks for image processing, and the ever-increasing availability of computational power needed for the implementation of efficient training. Compared to more traditional activation functions, the simple ReLU and recently suggested modifications warrant computational ease and appear to speed up the training, see for instance [11,14,15]. The problem of vanishing gradients, which arises when applying the chain rule in layered

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.