Abstract

Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN-5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.

Highlights

  • In recent years, deep learning has brought tremendous breakthroughs in artificial intelligence (AI)

  • This study aims to tackle the shortcomings of Rectified Linear Unit (ReLU) by introducing an adaptive non-linear activation function called Parametric Flatten-T Swish (PFTS)

  • A Parametric Flatten-T Swish (PFTS) activation function is presented. This activation function uses parametric strategy to learn its activation response from the network layers based on the inputs

Read more

Summary

Introduction

Deep learning has brought tremendous breakthroughs in artificial intelligence (AI) Such astonishing advancements are due to these factors: the availability of the massive amount of data, powerful computational hardware such as Graphic Processing Units (GPUs), and deep learning models. ReLU function is simple and easy to implement in any deep learning model (Lin & Shen, 2018) It keeps the positive inputs and discards the negative inputs. The non-saturation property of ReLU in the positive region ensures smooth gradient flow and avoids vanishing or exploding gradient problems (Nair & Hinton, 2010) Unlike classical methods such as Sigmoid and Tanh, the saturation properties both at the negative and positive regions further impede the gradient flow during the model training

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.