Analyzing the performances of squash functions in capsnets on complex images

Benjamin A Weyori,Yaw Afriyie,Alex A Opoku

doi:10.1080/23311916.2023.2203890

Abstract

Classical Convolutional Neural Networks (CNNs) have been the benchmark for most object classification and face recognition tasks despite their major shortcomings, including the inability to capture spatial co-location and the preference for invariance over equivariance. In order to overcome CNN’s shortcomings, CapsNets’ hierarchical routing layered architecture was developed. The capsule replaces average or maximum pooling techniques used in CNNs with dynamic routing between lower level and higher level neural units. It also introduces regularization mechanisms for dealing with equivariance properties in reconstruction, improving hierarchical data representation and improving hierarchical data representation. Since capsules can overcome existing limitations, they can serve as potential benchmarks for detecting, segmenting, and reconstructing objects. As a result of analyzing the fundamental MNIST handwritten digit dataset, CapsNets demonstrated state-of-the-art results. Through the use of two fundamental datasets, MNIST and CIFAR-10, we investigated a number of squash functions in order to further enhance this distinction. When compared to Sabour and Edgar’s models, the optimized squash function performs marginally better and presents fewer test errors. In comparison to both squash functions, the optimized squash function converges faster as it is more efficient, scalable, and similar and can be trained on any neural network.

Full Text