Abstract
The purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so it is difficult to identify small-scale objects from images. Besides, current attention-based methods tend to learn the most salient feature regions in images, but fail to excavate various potential useful features concealed by the most salient feature, thus limiting the further improvement of model performance. To address above issues, we propose a novel Feature Learning network based on Transformer to learn salient features and excavate potential useful features (FL-Tran). Specifically, in order to solve the problem that current methods are difficult to identify small-scale objects, we first present a novel multi-scale fusion module (MSFM) to align high-level features and low-level features to learn multi-scale features. Additionally, a spatial attention module (SAM) utilizing transformer encoder is introduced to capture salient object features in images to enhance the model performance. Furthermore, we devise a feature enhancement and suppression module (FESM) with the aim of excavating potential useful features concealed by the most salient features. By suppressing the most salient features obtained in current SAM layer, and then forcing subsequent SAM layer to excavate potential salient features in feature maps, FL-Tran model can learn various useful features more comprehensively. Extensive experiments on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets demonstrate that our proposed FL-Tran model outperforms current state-of-the-art methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.