Abstract

The target of generalized zero-shot learning (GZSL) is to train a model that can classify data samples from both seen categories and unseen categories under the circumstances that only the labeled samples from seen categories are available. In this paper, we propose a GZSL approach based on conditional generative models that adopts a contrastive disentanglement learning framework to disentangle visual information in the latent space. Specifically, our model encodes original and generated visual features into a latent space in which these visual features are disentangled into semantic-related and semantic-unrelated representations. The proposed contrastive learning framework leverages class-level and instance-level supervision, where it not only formulates contrastive loss based on semantic-related information at the instance level, but also exploits semantic-unrelated representations and the corresponding semantic information to form negative sample pairs at the class level to further facilitate disentanglement. Then, GZSL classification is performed by training a supervised model (e.g, softmax classifier) based only on semantic-related representations. The experimental results show that our model achieves state-of-the-art performance on several benchmark datasets, especially for unseen categories. The source code of the proposed model is available at: https://github.com/fwt-team/GZSL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call