Soft++, a multi-parametric non-saturating non-linearity that improves convergence in deep neural architectures

Andrei Ciuparu,Adriana Nagy-Dăbâcan,Raul C Mureşan

doi:10.1016/j.neucom.2019.12.014

Abstract

A key strategy to enable training of deep neural networks is to use non-saturating activation functions to reduce the vanishing gradient problem. Popular choices that saturate only in the negative domain are the rectified linear unit (ReLU), its smooth, non-linear variant, Softplus, and the exponential linear units (ELU and SELU). Other functions are non-saturating across the entire real domain, like the linear parametric ReLU (PReLU). Here we introduce a nonlinear activation function called Soft++ that extends PReLU and Softplus, parametrizing the slope in the negative domain and the exponent. We test identical network architectures with ReLU, PReLU, Softplus, ELU, SELU, and Soft++ on several machine learning problems and find that: i) convergence of networks with any activation function depends critically on the particular dataset and network architecture, emphasizing the need for parametrization, which allows to adapt the activation function to the particular problem; ii) non-linearity around the origin improves learning and generalization; iii) in many cases, non-saturation across the entire real domain further improves performance. On very difficult learning problems with deep fully-connected and convolutional networks, Soft++ outperforms all other activation functions, accelerating learning and improving generalization. Its main advantage lies in its dual parametrization, offering flexible control of the shape and gradient of the function.

Full Text