Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation

Elisa Oostwal,Michiel Straat,Michael Biehl

doi:10.1016/j.physa.2020.125517

Elisa Oostwal, Michiel Straat + Show 1 more

Open Access

https://doi.org/10.1016/j.physa.2020.125517

Copy DOI

Abstract

By applying concepts from the statistical physics of learning, we study layered neural networks of rectified linear units (ReLU). The comparison with conventional, sigmoidal activation functions is in the center of interest. We compute typical learning curves for large shallow networks with K hidden units in matching student teacher scenarios. The systems undergo phase transitions, i.e. sudden changes of the generalization performance via the process of hidden unit specialization at critical sizes of the training set. Surprisingly, our results show that the training behavior of ReLU networks is qualitatively different from that of networks with sigmoidal activations. In networks with K≥3 sigmoidal hidden units, the transition is discontinuous: Specialized network configurations co-exist and compete with states of poor performance even for very large training sets. On the contrary, the use of ReLU activations results in continuous transitions for all K. For large enough training sets, two competing, differently specialized states display similar generalization abilities, which coincide exactly for large hidden layers in the limit K→∞. Our findings are also confirmed in Monte Carlo simulations of the training processes.

Highlights

The re-gained interest in artificial neural networks [1,2,3,4,5] is largely due to the successful application of so-called Deep Learning in a number of practical contexts, see e.g. [6,7,8] for reviews and further references
The choice of limiting values 0 and 2 for small and large arguments, respectively, is arbitrary and irrelevant for the qualitative results of our analyses. (b) Rectified Linear Unit (ReLU) activation This simple, piece-wise linear transfer function has attracted considerable attention in the context of multi-layered neural networks: g (x) max {0, x}
We have investigated the training of shallow, layered neural networks in student teacher scenarios of matching complexity

Summary

Introduction

The re-gained interest in artificial neural networks [1,2,3,4,5] is largely due to the successful application of so-called Deep Learning in a number of practical contexts, see e.g. [6,7,8] for reviews and further references.The successful training of powerful, multi-layered deep networks has become feasible for a number of reasons including the automated acquisition of large amounts of training data in various domains, the use of modified and optimized architectures, e.g. convolutional networks for image processing, and the ever-increasing availability of computational power needed for the implementation of efficient training.One important modification of earlier models is the use of alternative activation functions [6,9,10]. The re-gained interest in artificial neural networks [1,2,3,4,5] is largely due to the successful application of so-called Deep Learning in a number of practical contexts, see e.g. The successful training of powerful, multi-layered deep networks has become feasible for a number of reasons including the automated acquisition of large amounts of training data in various domains, the use of modified and optimized architectures, e.g. convolutional networks for image processing, and the ever-increasing availability of computational power needed for the implementation of efficient training. Compared to more traditional activation functions, the simple ReLU and recently suggested modifications warrant computational ease and appear to speed up the training, see for instance [11,14,15]. The problem of vanishing gradients, which arises when applying the chain rule in layered

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Physica A: Statistical Mechanics and its Applications	Publication Date: Nov 4, 2020
Citations: 36	License type: cc-by

R Discovery Prime

R Discovery Prime

Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Physica A: Statistical Mechanics and its Applications

Lead the way for us

Similar Papers

Optimizing Parameters of Artificial Intelligence Deep Convolutional Neural Networks (CNN) to improve Prediction Performance of Load Forecasting System
F M Butt ... M A Hamza
IOP Conference Series: Earth and Environmental Science | VOL. 1026
F M Butt, et. al.F M Butt ... M A Hamza
01 May 2022
IOP Conference Series: Earth and Environmental Science | VOL. 1026

RETRACTED: Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments
K. Prasanna ... Saeed Alshahrani
Applied Sciences | VOL. 13
K. Prasanna, et. al.K. Prasanna ... Saeed Alshahrani
08 Jul 2023
Applied Sciences | VOL. 13

Very large-scale data classification based on K-means clustering and multi-kernel SVM
Tinglong Tang ... Shengyong Chen
Soft Computing | VOL. 23
Tinglong Tang, et. al.Tinglong Tang ... Shengyong Chen
29 Jan 2018
Soft Computing | VOL. 23

Minimum classification error training of landmark models for real-time continuous speech recognition
E Mcdermott ... T.J Hazen
-
E Mcdermott, et. al.E Mcdermott ... T.J Hazen
17 May 2004
17 May 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Physica A: Statistical Mechanics and its Applications