Abstract

Multi-label image classification is an essential task in image processing. How to improve the correlation between labels by learning multi-scale features from images is a very challenging problem. We propose a Double Attention Network (DAN) to improve the correlation between image feature regions and labels, as well as between labels and labels. Firstly, the dynamic learning strategy is used to extract the multi-scale features of the image to solve the problem of inconsistent scale of objects in the image. Secondly, in order to improve the correlation between the image feature regions and the labels, we use the spatial attention module to focus on the important regions of the image to learn their salient features, while we use the channel attention module to model the correlation between the channels to improve the correlation between the labels. Finally, the output features of two attention modules are fused as one multi-label image classification model. Experiments on MS-COCO 2014 dataset, Pascal VOC 2007 dataset and NUS-WIDE dataset demonstrate that our model is significantly better than the state-of-the-art models. Besides, visualization analyses show that our model has a strong ability for image salient feature learning and label correlation capturing.

Highlights

  • Multi-label image classification has always been a research hotspot in the field of computer vision, which aims to recognize different objects and attributes in the image

  • Visual Analysis First of all, we use the prediction results of the Double Attention Network (DAN)-CAM model to qualitatively analyze the correlation between labels, and use the Grad-CAM[47] method to visually analyze the correlation between the important regions of the image and the labels

  • DAN is composed of three essential modules: feature extraction module, spatial attention module, and channel attention module

Read more

Summary

Introduction

Multi-label image classification has always been a research hotspot in the field of computer vision, which aims to recognize different objects and attributes in the image. Due to the powerful representation ability of the convolutional neural network [1], [2], [3], and the abundance of labeled datasets such as ImageNet [4], the deep neural network method has made significant progress in the multilabel image classification task. These methods often overlook three problems: the scale inconsistency of image objects, the correlation between image feature regions and labels, and the correlation between labels.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.