Abstract
A key strategy to enable training of deep neural networks is to use non-saturating activation functions to reduce the vanishing gradient problem. Popular choices that saturate only in the negative domain are the rectified linear unit (ReLU), its smooth, non-linear variant, Softplus, and the exponential linear units (ELU and SELU). Other functions are non-saturating across the entire real domain, like the linear parametric ReLU (PReLU). Here we introduce a nonlinear activation function called Soft++ that extends PReLU and Softplus, parametrizing the slope in the negative domain and the exponent. We test identical network architectures with ReLU, PReLU, Softplus, ELU, SELU, and Soft++ on several machine learning problems and find that: i) convergence of networks with any activation function depends critically on the particular dataset and network architecture, emphasizing the need for parametrization, which allows to adapt the activation function to the particular problem; ii) non-linearity around the origin improves learning and generalization; iii) in many cases, non-saturation across the entire real domain further improves performance. On very difficult learning problems with deep fully-connected and convolutional networks, Soft++ outperforms all other activation functions, accelerating learning and improving generalization. Its main advantage lies in its dual parametrization, offering flexible control of the shape and gradient of the function.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.