Abstract

This study investigates the effectiveness of multiple maxout activation function variants on 18 datasets using Convolutional Neural Networks. A network with maxout activation has a higher number of trainable parameters compared to networks with traditional activation functions. However, it is not clear if the activation function itself or the increase in the number of trainable parameters is responsible in yielding the best performance for different entity recognition tasks. This paper investigates if an increase in the number of convolutional filters on traditional activation functions performs equal-to or better-than maxout networks. Our experiments compare the Rectified Linear Unit, Leaky Rectified Linear Unit, Scaled Exponential Linear Unit, and Hyperbolic Tangent activations to four maxout function variants. We observe that maxout networks train relatively slower than networks with traditional activation functions, e.g. Rectified Linear Unit. In addition, we found that on average, across all datasets, the Rectified Linear Unit activation function performs better than any maxout activation when the number of convolutional filters is increased. Furthermore, adding more filters enhances the classification accuracy of the Rectified Linear Unit networks, without adversely affecting their advantage over maxout activations with respect to network-training speed.

Highlights

  • Deep networks have become very useful for many computer vision applications

  • As opposed to the papers cited we evaluate if an increase in the number of filters in Rectified Linear Unit (ReLU) enhances the overall accuracy with significance testing

  • The results from the image datasets indicate that sextupling the number of convolutional filters on ReLU performed better than the rest of the activation functions, but made training more difficult due to large number of parameters

Read more

Summary

Introduction

Deep neural networks (DNNs) are models composed of multiple layers that transform input data to outputs while learning increasingly higher-level features. Deep learning relies on learning several levels of hierarchical representations for data Due to their hierarchical structure, the parameters of a DNN can generally be tuned to approximate target functions more effectively than parameters in a shallow model [1]. Compared to traditional activation functions, like the logistic sigmoid units or tanh units, which are antisymmetric, ReLU is one-sided. This property encourages the hidden units to be sparse, Castaneda et al J Big Data (2019) 6:72 and more biologically plausible [6]. At Semeval-2015 (International Workshop on Semantic Evaluation), Severyn and Moschitti’s models ranked first in the phrase-level subtask A and second in the message-level subtask B

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call