Abstract

Generalized zero-shot learning (GZSL) is an important research area in image computing, video processing, multimedia understanding, and other visual computing tasks. GZSL normally uses transferable semantic features to represent the visual features to predict unseen classes without training the unseen samples. The state-of-the-art zero-shot learning methods combine Generative Adversarial Network (GAN) and Contrastive Learning (CL) together to deeply transfer semantic features to visual features. However, the combined GAN module and CL module inevitably encounter the “semantic-visual inconsistent problem” in both the feature-generating process and the contrastive learning process. To handle the above problems, we propose the generation-based contrastive model with semantic alignment for generalized zero-shot learning. The proposed network is based on existing ZSL models combining GAN and CL, but with two additional alignment modules that are Feedback Alignment Module (FAM) and Negative sample Alignment Module (NAM). FAM applies an MLP (Multilayer Perceptron) to align the synthesized visual feature back to its semantic feature for keeping the semantic-visual consistency in the generator. NAM provides a new contrastive learning mechanism to align the negative pairs for keeping semantic-visual consistency during contrastive learning. Experimental results on massive real-world datasets show the proposed method achieves the new state-of-the-art in the field of generalized zero-shot learning. The source code of the proposed method is available athttps://github.com/yangjingqi99/GCSA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call