Improving the Performance of Deep Neural Networks Using Two Proposed Activation Functions

Asmaa A Alkhouly,Hesham A Hefny,Ammar Mohammed

doi:10.1109/access.2021.3085855

Asmaa A Alkhouly, Hesham A Hefny + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3085855

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 25	License type: CC BY 4.0

Affiliation: Cairo University, Misr International University

Abstract

In artificial neural networks, activation functions play a significant role in the learning process. Choosing the proper activation function is a major factor in achieving a successful learning performance. Many activation functions are sufficient universal approximators, but their performance is lacking. Thus, many efforts have been directed toward activation functions to improve the learning performance of artificial neural networks. However, the learning process involves many challenges, such as saturation, dying, and exploding/vanishing the gradient problems. The contribution of this work resides in several axes. First, we introduce two novel activation functions: absolute linear units and inverse polynomial linear units. Both activation functions are augmented by an adjustable parameter that controls the slope of the gradient. Second, we present a comprehensive study and a taxonomy of various types of activation functions. Third, we conduct a broad range of experiments on several deep neural architecture models with consideration of network type and depth. Fourth, we evaluate the proposed activation functions' performance in image and text classification tasks. For this purpose, several public benchmark datasets are utilized to evaluate and compare the performance of the proposed functions with that of a group of common activation functions. Finally, we deeply analyze the impact of several common activation functions on deep network architectures. Results reveal that the proposed functions outperform most of the popular activation functions in several benchmarks. The statistical study of the overall experiments on both classification categories indicates that the proposed activation functions are robust and superior among all the competitive activation functions in terms of average accuracy.

Highlights

The successful applications of machine learning techniques rely on the approximation functions that are learned from the underline data of problems [1]
The results show that by increasing the layers of Recurrent neural network (RNN) and fully connected network (FCNN), Iplu and absolute linear unit (AbsLU) achieve a significant improvement relative to other activation functions
The analysis reveals that inverse polynomial linear unit (IpLU) achieves a significant performance with the baseline convolutional neural network (CNN), residual networks, and dense networks

Summary

Introduction

The successful applications of machine learning techniques rely on the approximation functions that are learned from the underline data of problems [1]. Complex problems involve high dimensional nonlinear data To effectively learn such problems, ANNs should use the nonlinear activation functions(AFs) of their hidden layers [14], [15]. Empirical studies have proved that nonlinear mappings are easier to optimize and they converge rapidly [16], [17] For such purpose, many popular nonlinear activation functions have been used to fulfill the conditions of universal approximation theory [18], [19], such as Sigmoid, Tanh, SoftPlus, and ArcTan [20]. Many popular nonlinear activation functions have been used to fulfill the conditions of universal approximation theory [18], [19], such as Sigmoid, Tanh, SoftPlus, and ArcTan [20] These functions still cause several problems, including vanishing gradient and saturation [21], which lead to a poor performance and undesirable

Methods

Results

Conclusion