Abstract
Multi-label image classification (MLIC), a fundamental task assigning multiple labels to each image, has been seen notable progress in recent years. Considering simultaneous appearances of objects in the physical world, modeling object correlations is crucial for enhancing classification accuracy. This involves accounting for spatial image feature correlation and label semantic correlation. However, existing methods struggle to establish these correlations due to complex spatial location and label semantic relationships. On the other hand, regarding the fusion of image feature relevance and label semantic relevance, existing methods typically learn a semantic representation in the final CNN layer to combine spatial and label semantic correlations. However, different CNN layers capture features at diverse scales and possess distinct discriminative abilities. To address these issues, in this paper we introduce the Semantic Guidance-Based Fusion Network (SGFN) for MLIC. To model spatial image feature correlation, we leverage the advanced TResNet architecture as the backbone network and employ the Feature Aggregation Module for capturing global spatial correlation. For label semantic correlation, we establish both local and global semantic correlation. We further enrich model features by learning semantic representations across multiple convolutional layers. Our method outperforms current state-of-the-art techniques on PASCAL VOC (2007, 2012) and MS-COCO datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.