Optimizing nonlinear activation function for convolutional neural networks

Munender Varshney,Pravendra Singh

doi:10.1007/s11760-021-01863-z

Abstract

Activation functions play a critical role in the training and performance of the deep convolutional neural networks. Currently, the rectified linear unit (ReLU) is the most commonly used activation function for the deep CNNs. ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. In this work, we propose a novel approach to generalize the ReLU activation function using multiple learnable slope parameters. These learnable slope parameters are optimized for every channel, which leads to the learning of a more generalized activation function (a variant of ReLU) corresponding to each channel. This activation is named as fully parametric rectified linear unit (FReLU) and trained using an alternate optimization technique by learning one set of parameters, keeping another set of parameters frozen. Our experiments show that the method outperforms ReLU and its other variant activation functions and also generalizes over various tasks such as image classification, object detection and action recognition in videos. The Top-1 classification accuracy of FReLU on ImageNet improves by 3.75% for MobileNet and $$\sim $$ 2% for ResNet-50 over ReLU. We also provide various analyses for better interpretability of our proposed activation function.

Full Text