Abstract

Zero-shot image recognition attempts to simulate the zero-shot learning mechanism of humans and recognizes the images of novel classes. It is crucial to learn transferable knowledge from seen classes and generalize it to unseen classes for image recognition in zero-shot learning (ZSL). Most existing ZSL methods extract visual features with pretrained backbone networks and learn transferable knowledge with the extracted visual features. However, the backbone networks are not pretrained for a special task, and the extracted visual features usually contain some distractive information for the ZSL task, which causes some discriminative information to be ignored or weakened and degrades the quality of knowledge learned from seen classes. Moreover, since visual samples of unseen classes are not obtainable, domain shift is another challenging problem. In this paper, we propose visual feature enhancement to learn more discriminative visual features via a graph convolutional network (GCN) and an attention mechanism for improving the quality of the learned transferable knowledge. Different from previous works, we explore the correlations between different latent visual patterns of an image and introduce GCN to enhance visual features. On the other hand, we take advantage of different learning mechanisms of GCN and MLP and propose dual classifier learning for improving the generalization and inference capabilities of our model. In end-to-end model training, the module of visual feature enhancement and the module of dual classifier learning are beneficial to each other via joint optimization. Finally, we perform extensive experiments in the ZSL setting and GZSL setting. The extensive experimental results verify the effectiveness and superiority of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call