Smish: A Novel Activation Function for Deep Learning Methods

Xueliang Wang,Achuan Wang,Honge Ren

doi:10.3390/electronics11040540

Xueliang Wang, Achuan Wang + Show 1 more

Open Access

https://doi.org/10.3390/electronics11040540

Copy DOI

Journal: Electronics	Publication Date: Feb 11, 2022
Citations: 28	License type: CC BY 4.0

Affiliation: Northeast Forestry University, Qiqihar University

Abstract

Activation functions are crucial in deep learning networks, given that the nonlinear ability of activation functions endows deep neural networks with real artificial intelligence. Nonlinear nonmonotonic activation functions, such as rectified linear units, Tan hyperbolic (tanh), Sigmoid, Swish, Mish, and Logish, perform well in deep learning models; however, only a few of them are widely used in mostly all applications due to their existing inconsistencies. Inspired by the MB-C-BSIF method, this study proposes Smish, a novel nonlinear activation function, expressed as f(x)=x·tanh[ln(1+sigmoid(x))], which could overcome other activation functions with good properties. Logarithmic operations are first used to reduce the range of sigmoid(x). The value is then calculated using the tanh operator. Inputs are ultimately used to multiply the previous value, thus exhibiting negative output regularization. Experiments show that Smish tends to operate more efficiently than Logish, Mish, and other activation functions on EfficientNet models with open datasets. Moreover, we evaluated the performance of Smish in various deep learning models and the parameters of its function f(x)=αx·tanh[ln(1+sigmoid(βx))], and where α = 1 and β = 1, Smish was found to exhibit the highest accuracy. The experimental results show that with Smish, the EfficientNetB3 network exhibits a Top-1 accuracy of 84.1% on the CIFAR-10 dataset; the EfficientNetB5 network has a Top-1 accuracy of 99.89% on the MNIST dataset; and the EfficientnetB7 network has a Top-1 accuracy of 91.14% on the SVHN dataset. These values are superior to those obtained using other state-of-the-art activation functions, which shows that Smish is more suitable for complex deep learning models.

Highlights

Top-1 accuracy of 91.14% on the SVHN dataset. These values are superior to those obtained using other state-of-the-art activation functions, which shows that Smish is more suitable for complex deep learning models
The principle of deep learning networks is that input is passed from one neuron to the via an activation function, and the process is repeated until the output layer is reached
We proposed Smish, a deep learning activation function and its variation

Summary

Introduction

The principle of deep learning networks is that input is passed from one neuron to the via an activation function, and the process is repeated until the output layer is reached. Nonlinear activation functions—Sigmoid, ReLU, Swish, Mish, and Logish—are frequently used [8,9]. Sigmoid maps all values to (0, 1), which is associated with the vanishing gradient problem To address this concern, the tanh activation function is proposed [10], it does not eliminate the aforementioned problem in deep neural networks. In order to improve the accuracy rate of classification results, we designed a new activation function named Smish to solve the previously mentioned problems in deep learning networks, the aforementioned characteristics being present to ensure negative activation and derivative values and to maintain partial sparsity and a regularization effect for negative inputs. 2. Smish provides a higher learning accuracy, compared with Logish, Mish, Swish, and ReLu, all of which are used in several EfficientNet models.

Logish

Construction of Smish

Curves

Approximate

Nonmonotonicity

Analysis of Hyperparameter Tuning for Smish

Analysis of the Number of Layers

Analysis of Batch Sizes size refers samples selected in in every

Analysis

Analysis of Different Optimizers

Analysis of Different

Datasets and Experimental Settings

The first functions were on

Results on MNIST

Accuracy

Results on on SVHN

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Smish: A Novel Activation Function for Deep Learning Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Parametric RSigELU: a new trainable activation function for deep learning
Serhat Kiliçarslan ... Mete Celik
Neural Computing & Applications | VOL. 36
Serhat Kiliçarslan, et. al.Serhat Kiliçarslan ... Mete Celik
28 Feb 2024
Neural Computing & Applications | VOL. 36

Fractional ordering of activation functions for neural networks: A case study on Texas wind turbine
Bhukya Ramadevi ... Kishore Bingi
Engineering Applications of Artificial Intelligence | VOL. 127
Bhukya Ramadevi, et. al.Bhukya Ramadevi ... Kishore Bingi
18 Oct 2023
Engineering Applications of Artificial Intelligence | VOL. 127

Logish: A new nonlinear nonmonotonic activation function for convolutional neural network
Hegui Zhu ... Xiangde Zhang
Neurocomputing | VOL. 458
Hegui Zhu, et. al.Hegui Zhu ... Xiangde Zhang
24 Jun 2021
Neurocomputing | VOL. 458

A DEEP LEARNING NETWORK FOR MODELLING THE RELATIONSHIP OF THE REAL GDP, CO2 EMISSION AND RENEWABLE ENERGY CONSUMPTION FOR TURKIYE
Cagatay Tuncsiper ...
Cognizance Journal of Multidisciplinary Studies | VOL. 3
Cagatay Tuncsiper, et. al.Cagatay Tuncsiper ...
30 Jan 2023
Cognizance Journal of Multidisciplinary Studies | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Smish: A Novel Activation Function for Deep Learning Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics