Abstract

Activation functions are crucial in deep learning networks, given that the nonlinear ability of activation functions endows deep neural networks with real artificial intelligence. Nonlinear nonmonotonic activation functions, such as rectified linear units, Tan hyperbolic (tanh), Sigmoid, Swish, Mish, and Logish, perform well in deep learning models; however, only a few of them are widely used in mostly all applications due to their existing inconsistencies. Inspired by the MB-C-BSIF method, this study proposes Smish, a novel nonlinear activation function, expressed as f(x)=x·tanh[ln(1+sigmoid(x))], which could overcome other activation functions with good properties. Logarithmic operations are first used to reduce the range of sigmoid(x). The value is then calculated using the tanh operator. Inputs are ultimately used to multiply the previous value, thus exhibiting negative output regularization. Experiments show that Smish tends to operate more efficiently than Logish, Mish, and other activation functions on EfficientNet models with open datasets. Moreover, we evaluated the performance of Smish in various deep learning models and the parameters of its function f(x)=αx·tanh[ln(1+sigmoid(βx))], and where α = 1 and β = 1, Smish was found to exhibit the highest accuracy. The experimental results show that with Smish, the EfficientNetB3 network exhibits a Top-1 accuracy of 84.1% on the CIFAR-10 dataset; the EfficientNetB5 network has a Top-1 accuracy of 99.89% on the MNIST dataset; and the EfficientnetB7 network has a Top-1 accuracy of 91.14% on the SVHN dataset. These values are superior to those obtained using other state-of-the-art activation functions, which shows that Smish is more suitable for complex deep learning models.

Highlights

  • Top-1 accuracy of 91.14% on the SVHN dataset. These values are superior to those obtained using other state-of-the-art activation functions, which shows that Smish is more suitable for complex deep learning models

  • The principle of deep learning networks is that input is passed from one neuron to the via an activation function, and the process is repeated until the output layer is reached

  • We proposed Smish, a deep learning activation function and its variation

Read more

Summary

Introduction

The principle of deep learning networks is that input is passed from one neuron to the via an activation function, and the process is repeated until the output layer is reached. Nonlinear activation functions—Sigmoid, ReLU, Swish, Mish, and Logish—are frequently used [8,9]. Sigmoid maps all values to (0, 1), which is associated with the vanishing gradient problem To address this concern, the tanh activation function is proposed [10], it does not eliminate the aforementioned problem in deep neural networks. In order to improve the accuracy rate of classification results, we designed a new activation function named Smish to solve the previously mentioned problems in deep learning networks, the aforementioned characteristics being present to ensure negative activation and derivative values and to maintain partial sparsity and a regularization effect for negative inputs. 2. Smish provides a higher learning accuracy, compared with Logish, Mish, Swish, and ReLu, all of which are used in several EfficientNet models.

Related
Logish
Construction of Smish
Curves
Approximate
Nonmonotonicity
Analysis of Hyperparameter Tuning for Smish
Analysis of the Number of Layers
Analysis of Batch Sizes size refers samples selected in in every
Analysis
Analysis of Different Optimizers
Analysis of Different
Datasets and Experimental Settings
The first functions were on
Results on MNIST
Accuracy
Results on on SVHN
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call