Abstract

Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize both seen and unseen classes. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Recently, Generative adversarial network (GAN) have gained considerable attention in GZSL. GAN can generate missing unseen classes samples from class-specific semantic embedding for training, thereby transforming GZSL into a traditional classification task and achieving impressive results. However, due to the instability during training and the complexity of data distribution, a simple GAN framework cannot capture the real data distribution perfectly, and there is still a large gap between the generated and real sample distributions, which severely limits the performance of GZSL. Therefore, the proposed GAN-MVAE further aligns the real and generated samples by mapping them into the latent space of multi-modal reconstruction variational autoencoder (MVAE), while preserving discriminative semantic information through cross-modal reconstruction. GAN-MVAE provides some inspiration for the study of multi-modal alignment and asymmetry VAE. Extensive experiments on four GZSL benchmark datasets show that GAN-MVAE significantly outperforms the state of the arts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.