PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING

Hock Hung Chieng,Pauline Ong,Noorhaniza Wahid

doi:10.32890/jict.20.1.2021.9267

Hock Hung Chieng, Pauline Ong + Show 1 more

Open Access

https://doi.org/10.32890/jict.20.1.2021.9267

Copy DOI

Abstract

Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN-5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.

Highlights

In recent years, deep learning has brought tremendous breakthroughs in artificial intelligence (AI)
This study aims to tackle the shortcomings of Rectified Linear Unit (ReLU) by introducing an adaptive non-linear activation function called Parametric Flatten-T Swish (PFTS)
A Parametric Flatten-T Swish (PFTS) activation function is presented. This activation function uses parametric strategy to learn its activation response from the network layers based on the inputs

Summary

Introduction

Deep learning has brought tremendous breakthroughs in artificial intelligence (AI) Such astonishing advancements are due to these factors: the availability of the massive amount of data, powerful computational hardware such as Graphic Processing Units (GPUs), and deep learning models. ReLU function is simple and easy to implement in any deep learning model (Lin & Shen, 2018) It keeps the positive inputs and discards the negative inputs. The non-saturation property of ReLU in the positive region ensures smooth gradient flow and avoids vanishing or exploding gradient problems (Nair & Hinton, 2010) Unlike classical methods such as Sigmoid and Tanh, the saturation properties both at the negative and positive regions further impede the gradient flow during the model training

Objectives

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Information and Communication Technology	Publication Date: Jan 1, 2020
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Information and Communication Technology

Lead the way for us

Similar Papers

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
Hock Hung Chieng ... Sai Raj Kishore Perla
International Journal of Advances in Intelligent Informatics | VOL. 4
Hock Hung Chieng, et. al.Hock Hung Chieng ... Sai Raj Kishore Perla
31 Jul 2018
International Journal of Advances in Intelligent Informatics | VOL. 4

Smooth Maximum Unit: Smooth Activation Function for Deep Networks using Smoothing Maximum Technique
Koushik Biswas ... Sandeep Kumar
-
Koushik Biswas, et. al.Koushik Biswas ... Sandeep Kumar
01 Jun 2022
01 Jun 2022

Optimizing nonlinear activation function for convolutional neural networks
Munender Varshney ... Pravendra Singh
Signal, Image and Video Processing | VOL. 15
Munender Varshney, et. al.Munender Varshney ... Pravendra Singh
19 Feb 2021
Signal, Image and Video Processing | VOL. 15

Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks
Yang Liu ... Jianpeng Zhang
-
Yang Liu, et. al.Yang Liu ... Jianpeng Zhang
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Information and Communication Technology