Learnable Leaky ReLU (LeLeLU): An Alternative Accuracy-Optimized Activation Function

Andreas Maniatopoulos,Nikolaos Mitianoudis

doi:10.3390/info12120513

Andreas Maniatopoulos, Nikolaos Mitianoudis

Open Access

https://doi.org/10.3390/info12120513

Copy DOI

Journal: Information	Publication Date: Dec 9, 2021
Citations: 23	License type: CC BY 4.0

Affiliation: Democritus University of Thrace

Abstract

In neural networks, a vital component in the learning and inference process is the activation function. There are many different approaches, but only nonlinear activation functions allow such networks to compute non-trivial problems by using only a small number of nodes, and such activation functions are called nonlinearities. With the emergence of deep learning, the need for competent activation functions that can enable or expedite learning in deeper layers has emerged. In this paper, we propose a novel activation function, combining many features of successful activation functions, achieving 2.53% higher accuracy than the industry standard ReLU in a variety of test cases.

Highlights

Activation functions originated from the attempt to generalize a linear discriminant function in order to address nonlinear classification problems in pattern recognition
We consider the accuracy achieved by rectified linear unit (ReLU) as the baseline result, and we calculate normalized accuracy as the ratio of the new activation function accuracy over the accuracy achieved by ReLU
All activation functions perform well, with the LeLeLU giving a small boost of 0.23% over the baseline ReLU

Summary

Introduction

The parameter α is learnable per filter during training, and during testing, we observed a correlation between dataset complexity, depth-wise position of respective filter in the neural network topology and training phase The strong point of the proposed activation function is that the learnable parameter influences both the negative and the positive values This implies that the adaptation of α can accelerate training in certain parts of the network during certain epochs of the training procedure, when α gets values that are larger than 1. The adaptation of the parameter α is investigated in more detail

Parameter Adaptation and Network Regularization

Datasets and Network Topologies

Numerical Results

VGG-16 with LeLeLU

ResNet-v1-56 with LeLeLU

Discussion