Current convolutional neural networks (CNNs) lack viewpoint equivariance. Hence, they perform poorly when dealing with viewpoints unseen during training procedures. CNN achieves invariance via pooling operations in image classification tasks. However, the pooling operation does not necessarily improve viewpoint generalization, rather relying on more data to achieve viewpoint equivariance. Capsule network (CapsNet) is proposed to tackle this issue, but it is inefficient and inaccurate when applied to complex datasets. We propose a novel CapsNet architecture called Global Routing CapsNet (GR-CapsNet) to solve this problem. Specifically, colored background in the input image can generate invalid background voting capsules to reduce the performance of CapsNet. Therefore, we first construct a dynamic linear unit (DLU), which avoids the generation of invalid background voting capsules. Then we present two extra learnable units: frequency domain unit (FDU) and spatial unit (SPU). The former is used to capture finer features in the frequency domain and aims to improve classification performance on complex datasets. The latter is applied to construct the spatial relationship between the voting capsules and component capsules and aims to enhance robustness to affine transformation. Finally, we propose a global routing mechanism to simplify the routing process for CapsNet, which obtains more feature information to improve the performance of CapsNet. Extensive experiments on nine datasets show that our method obtains better robustness and generalization and achieves SOTA performance compared to other related methods. And it has fewer the number of parameters and GPU memory consumption than these related methods. The source code is available on https://github.com/cwpl/GR-CapsNet.
Read full abstract