A Discriminative Cross-Aligned Variational Autoencoder for Zero-Shot Learning.

Yang Liu,Ling Shao,Jungong Han,Xinbo Gao

doi:10.1109/tcyb.2022.3164142

Abstract

Zero-shot learning (ZSL) aims to classify unseen samples based on the relationship between the learned visual features and semantic features. Traditional ZSL methods typically capture the underlying multimodal data structures by learning an embedding function between the visual space and the semantic space with the Euclidean metric. However, these models suffer from the hubness problem and domain bias problem, which leads to unsatisfactory performance, especially in the generalized ZSL (GZSL) task. To tackle such a problem, we formulate a discriminative cross-aligned variational autoencoder (DCA-VAE) for ZSL. The proposed model effectively utilizes a modified cross-modal-alignment variational autoencoder (VAE) to transform both visual features and semantic features obtained by the discriminative cosine metric into latent features. The key to our method is that we collect principal discriminative information from visual and semantic features to construct latent features which contain the discriminative multimodal information associated with unseen samples. Finally, the proposed model DCA-VAE is validated on six benchmarks including the large dataset ImageNet, and several experimental results demonstrate the superiority of DCA-VAE over most existing embedding or generative ZSL models on the standard ZSL and the more realistic GZSL tasks.

Full Text