Abstract

Classical Convolutional Neural Networks, or ConvNets, have been the benchmarks for most object classification and face recognition tasks despite suffering from major limitations such as the inability to capture spatial co-locality between data points and favoring invariance over equivariance. Hence, a hierarchical routing layered architecture called Capsule Networks was proposed to overcome shortcomings of ConvNets. Capsules replace average or max pooling techniques of ConvNets with dynamic routing abilities between lower level and higher level neural units which better capture hierarchical relationships within the data and introduced reconstruction regularization mechanisms which deals with equivariance properties. By overcoming existing limitations, Capsules have proven themselves to be potential benchmarks in object segmentation, detection and reconstruction. Capsules have achieved state of the art results on the fundamental MNIST (Modified National Institute of Standards and Technology) handwritten digit dataset by reducing the ConvNets test error benchmark of 0.39% to 0.25%. In order to further augment this distinction, we experimented with five activation units such as sigmoid, e-Swish, Swish, variants of Rectified Linear Units (ReLU) like Parametric ReLU (PReLU), leaky ReLU (lReLU) and Scaled Exponential Linear Units (SELU), on two fundamental datasets - MNIST and cifar10. Based on these experimental results, we establish that e-Swish, and ReLU variants better optimize the Capsule architecture as compared to the currently used ReLU activation function.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call