Abstract
Zero-shot learning enables the recognition of classes not seen during training through the use of semantic information comprising a visual description of the class either in textual or attribute form. Despite the advances in the performance of zero-shot learning methods, most of the works do not explicitly exploit the correlation between the visual attributes of the image and their corresponding semantic attributes for learning discriminative visual features. In this paper, we introduce an attention-based strategy for deriving features from the image regions regarding the most prominent attributes of the image class. In particular, we train a Convolutional Neural Network (CNN) for image attribute prediction and use a gradient-weighted method for deriving the attention activation maps of the most salient image attributes. These maps are then incorporated into the feature extraction process of Zero-Shot Learning (ZSL) approaches for improving the discriminability of the features produced through the implicit inclusion of semantic information. For experimental validation, the performance of state-of-the-art ZSL methods was determined using features with and without the proposed attention model. Surprisingly, we discover that the proposed strategy degrades the performance of ZSL methods in classical ZSL datasets (AWA2), but it can significantly improve performance when using face datasets. Our experiments show that these results are a consequence of the interpretability of the dataset attributes, suggesting that existing ZSL datasets attributes are, in most cases, difficult to be identifiable in the image. Source code is available at https://github.com/CristianoPatricio/SGAM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.