Abstract

Convolution neural network (CNN) achieves outstanding results in single-label image classification task. However, due to the complex underlying object layout and insufficient multi-label training images, how to achieve better performance for multi-label images via CNN is still an open problem. In this work, we propose an improved deep CNN model which can extract features of objects at different scales in multi-label images by spatial pyramid pooling as well as feature fusion. In model training, we first transfer the parameters pre-trained on ImageNet to our model, then an Adversarial Network is trained to generate examples with occlusions, which makes our model invariant to occlusions. Experimental results on Pascal VOC 2012 and Corel 5K image datasets demonstrate the superiority of the proposed approach over many approaches. The mAP of our model reaches 84.0% on the VOC 2012 dataset, which significantly outperforms most approaches and closes to HCP, the representative multi-label classification approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call