An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks

Cody Dennis,Beatrice M Ombuki-Berman,Andries P Engelbrecht

doi:10.1007/s11063-020-10290-z

Cody Dennis, Beatrice M Ombuki-Berman + Show 1 more

https://doi.org/10.1007/s11063-020-10290-z

Copy DOI

Abstract

The activation functions used in an artificial neural network define how nodes of the network respond to input, directly influence the shape of the error surface and play a role in the difficulty of the neural network training problem. Choice of activation functions is a significant question which must be addressed when applying a neural network to a problem. One issue which must be considered when selecting an activation function is known as activation function saturation. Saturation occurs when a bounded activation function primarily outputs values close to its boundary. Excessive saturation damages the network’s ability to encode information and may prevent successful training. Common functions such as the logistic and hyperbolic tangent functions have been shown to exhibit saturation when the neural network is trained using particle swarm optimization. This study proposes a new measure of activation function saturation, evaluates the saturation behavior of eight common activation functions, and evaluates six measures of controlling activation function saturation in particle swarm optimization based neural network training. Activation functions that result in low levels of saturation are identified. For each activation function recommendations are made regarding which saturation control mechanism is most effective at reducing saturation.

Full Text