Activation Functions: Experimentation and Comparison

Disha Gangadia

doi:10.1109/i2ct51068.2021.9417890

Abstract

Activation functions are mathematical functions that are used to activate the neurons of an Artificial Neural Network. Non-linear activation functions mainly help a neural network to converge faster while learning and finding patterns in the complex input data. A neural network learns by updating the weights, which is done using the Back Propagation algorithm, which uses first-order derivatives of the activation functions to calculate the gradient descent. This paper tests various existing and proposed activation functions against Minst and Cifar10 datasets for image classification using a shallow Convolutional Neural Network (CNN) Architecture. Based on the results, some of the proposed activation functions: SMod = $x$ * tanh ( $x$ ), the Absolute/Mod Function, a scaled version of Swish, and some other activation functions, are found to be promising. Some of these are then tested against Deeper Neural Networks for various datasets, and it is observed that the average error rate is improved by 2.77. Along with that, suggestions on which activation functions to be used for shallow and deep layers of a Deep Neural Network are provided, resulting in better performance.

Full Text