Abstract
A capsule network encodes entity features into a capsule and maps a spatial relationship from the local feature to the overall feature by dynamic routing. This structure allows the capsule network to fully capture feature information but inevitably leads to a lack of spatial relationship guidance, sensitivity to noise features, and easy susceptibility to falling into local optimization. Therefore, we propose a novel capsule network based on feature and spatial relationship coding (FSc-CapsNet). Feature and spatial relationship extractors are introduced to capture features and spatial relationships, respectively. The feature extractor abstracts feature information from bottom to top, while attenuating interference from noise features, and the spatial relationship extractor provides spatial relationship guidance from top to bottom. Then, instead of dynamic routing, a feature and spatial relationship encoder is proposed to find the optimal combination of features and spatial relationships. The encoder abandons the idea of iterative optimization but adds the optimization process to the backpropagation. The experimental results show that, compared with the capsule network and its multiple derivatives, the proposed FSc-CapsNet achieves significantly better performance on both the Fashion-MNIST and CIFAR-10 datasets. In addition, compared with some mainstream deep learning frameworks, FSc-CapsNet performs quite competitively on Fashion-MNIST.
Highlights
Traditional convolutional neural networks (CNNs)1 have obvious limitations for exploring spatial relationships
Higher-layer capsules are used to capture the overall features, such as “face” or “car,” while the lower-layer capsules are used to capture local entity features such as “nose,” “mouth,” or “wheels,” leading to a completely different approach than a convolutional network when abstracting the overall feature from local features
The parameter settings in the first three layers are the same as those in FSc-CapsNet, and dynamic routing is applied between the Prim-Caps layer and the Caps layer
Summary
Traditional convolutional neural networks (CNNs) have obvious limitations for exploring spatial relationships. The general method for classifying images of the same type taken from different angles is to train multiple neurons to process features and add a top-level detection neuron to detect the classification results. This approach tends to remember the dataset rather than summarizing the solution, and it requires large amounts of training data to cover different variants and avoid overfitting. This characteristic makes CNNs very vulnerable when dealing with tasks based on moved, rotated, or resized samples. A complete identification process requires both bottom–up feature abstraction and top–down spatial relationship guidance. To achieve a more natural recognition, a top–down stream of spatial relationship information is needed
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.