Abstract

Multi-label image classification (MLIC) is a more challenging task compared with single-label image classification due to multiple concepts targets, and complex visual relationships should be formulated. Convolutional Neural Network (CNN) and Visual Transformer (ViT) have shown superior performance in local and global feature representations, respectively. However, the interactions between local and global features are neglected in current works. To further formulate the critical interactions, this paper designs a Unified Feature Interaction (UFI) framework, aiming to integrate the selected local features with global features based on CNN and ViT, simultaneously. The proposed UFI includes two key modules: Class-Related Feature Selection (CRFS) and Feature Interaction Attention (FIA) modules. Specifically, according to the activation map, CRFS selects target regions by the preliminary calculation of predicted scores. FIA enables the significant local-global feature interaction based on the selected target regions and whole image. We initially attempted to interact with local and global features for multi-label image classification. UFI provides a stable improvement over the baseline and produces a new state-of-the-art result on MS-COCO and VOC2007.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.