Zero-shot learning is dedicated to solving the classification problem of unseen categories, while generalized zero-shot learning aims to classify the samples selected from both seen classes and unseen classes, in which “seen” and “unseen” classes indicate whether they can be used in the training process, and if so, they indicate seen classes, and vice versa. Nowadays, with the promotion of deep learning technology, the performance of zero-shot learning has been greatly improved. Generalized zero-shot learning is a challenging topic that has promising prospects in many realistic scenarios. Although the zero-shot learning task has made gratifying progress, there is still a strong deviation between seen classes and unseen classes in the existing methods. Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the intrinsic characteristics of visual features which are discriminative enough to be classified by itself. To solve the above problems, we propose a novel model that uses the discriminative information of visual features to optimize the generative module, in which the generative module is a dual generation network framework composed of conditional VAE and improved WGAN. Specifically, the model uses the discrimination information of visual features, according to the relevant semantic embedding, synthesizes the visual features of unseen categories by using the learned generator, and then trains the final softmax classifier by using the generated visual features, thus realizing the recognition of unseen categories. In addition, this paper also analyzes the effect of the additional classifiers with different structures on the transmission of discriminative information. We have conducted a lot of experiments on six commonly used benchmark datasets (AWA1, AWA2, APY, FLO, SUN, and CUB). The experimental results show that our model outperforms several state-of-the-art methods for both traditional as well as generalized zero-shot learning.
Read full abstract