Activation function design for deep networks: linearity and effective initialisation

M Murray,V Abrol,J Tanner

doi:10.1016/j.acha.2021.12.010

M Murray, V Abrol + Show 1 more

Open Access

https://doi.org/10.1016/j.acha.2021.12.010

Copy DOI

Abstract

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance σ b 2 of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms of test and training accuracy and in terms of training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied and Computational Harmonic Analysis	Publication Date: Jul 1, 2022
Citations: 8	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Activation function design for deep networks: linearity and effective initialisation

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Harmonic Analysis

Lead the way for us

Similar Papers

Flatness Prediction of Cold Rolled Strip Based on Deep Neural Network with Improved Activation Function.
Jingyi Liu ... Maimutimin Balaiti
Sensors | VOL. 22
Jingyi Liu, et. al.Jingyi Liu ... Maimutimin Balaiti
15 Jan 2022
Sensors | VOL. 22

Refining the Efficiency of R-CNN in Pedestrian Detection
Katleho L Masita ... Ali N Hasan
-
Katleho L Masita, et. al.Katleho L Masita ... Ali N Hasan
10 Sep 2021
10 Sep 2021

A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks
Xin Liu ... Zhisong Pan
Information Sciences | VOL. 612
Xin Liu, et. al.Xin Liu ... Zhisong Pan
05 Sep 2022
Information Sciences | VOL. 612

RETRACTED: Breast cancer diagnosis using multiple activation deep neural network
K Vijayakumar ... Vinod J Kadam
Concurrent Engineering | VOL. 29
K Vijayakumar, et. al.K Vijayakumar ... Vinod J Kadam
25 Jun 2021
Concurrent Engineering | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Activation function design for deep networks: linearity and effective initialisation

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Harmonic Analysis