Visual-guided attentive attributes embedding for zero-shot learning

Rui Zhang,Qi Zhu,Xiangyu Xu,Daoqiang Zhang,Sheng-Jun Huang

doi:10.1016/j.neunet.2021.07.031

Abstract

Zero-shot learning (ZSL) aims to learn a classifier for unseen classes by exploiting both training data from seen classes and external knowledge. In many visual tasks such as image classification, a set of high-level attributes that describe the semantic properties of classes are used as the external knowledge to bridge seen and unseen classes. While the attributes are usually treated equally by previous ZSL studies, we observe that the contribution of different attributes varies significantly over model training. To adaptively exploit the discriminative information embedded in different attributes, we propose a novel encoder–decoder framework with attention mechanism on the attribute level for zero-shot learning. Specifically, by mapping the visual features into a semantic space, the more discriminative attributes are emphasized with larger attention weights. Further, the attentive attributes and the class prototypes are simultaneously decoded to the visual space so that the hubness problem can be eased. Finally, the labels are predicted in the visual space. Extensive experiments on multiple benchmark datasets demonstrate that our proposed model achieves a significant boost over several state-of-the-art methods for ZSL task and comparative results for generalized ZSL task.

Full Text