Abstract

During the past few years, single-label classification has been extensively studied. However, in public datasets, the number of multiple-labeled images is much larger than the number of single-labeled images, which means that the study of multi-label image classification is more important. Most of the published network for multi-label image classification uses a CNN with a sigmoid layer, which is different from the single-label classification network using a CNN with a softmax layer. The binary cross entropy is often used as loss function for multi-label image classification. But due to the complex underlying object layout and feature confusion caused by multiple tags, the effect of CNN with a sigmoid layer on multi-label image classification is not satisfactory. Recently, in order to break some restrictions of CNN, the concept of capsule networks has been proposed. In this paper, a capsule network layer has been used to replace the traditional fully-connected layer and the sigmoid layer in the CNN network to improve the effect of multi-label image classification. In order to solve the deep network’s convergence problem due to insufficient training data, fine-tuning DCNNs techniques have been applied to the capsule network architecture. In the experiments, three datasets, PASCAL VOC 2007, PASCAL VOC 2012 and NUS-WIDE, have been used. The proposed CNN+Capsule architecture has been compared with the traditional CNN+FullyConnected architecture. It has been shown that with different parameter settings the proposed CNN+Capsule architecture can consistently achieve better performance than the CNN+FullyConnected architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call