Abstract

Recent works have shown that multi-label image recognition is still a challenging task in computer vision due to the complicatedness and diversity of multi-label images. However, the existing works ignore the co-occurrence correlation and global contextual information between image space and objects. We present a model to solve these problems. On the one hand, we devise the graph attention mechanism to compute the hidden representations of different categories in multi-label images. It can specify different weights to different neighbor objects and well model the label dependency. On the other hand, we iterate the global contextual information by the second-order covariance pooling to enhance nonlinear modeling capability and use basic residual network to extract features. The proposed model is thoroughly evaluated on PASCAL VOC 2007 and MS-COCO datasets. Compared with classical ML-GCN, the model can better combine the image features and label embedding. Meanwhile, experiments show that it outperforms the state-of-the-art methods such as residual multi-layer perceptron, EfficientNet, and vision transformer.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.