Abstract
Most zero-shot learning (ZSL) methods based on generative adversarial networks (GANs) utilize random noise and semantic descriptions to synthesize image features of unseen classes, which alleviates the problem of training data imbalance between seen and unseen classes. However, these methods usually only learn the distributions of seen classes in the training stage, ignoring the unseen ones. Due to the different distributions of seen and unseen samples, i.e., image features, these methods cannot generate unseen features of sufficient quality, so the performances are also limited, especially for the generalized zero-shot learning (GZSL) setting. In this article, we propose a general transductive method based on GANs, called GT-GAN, which can improve the quality of generated unseen image features and therefore benefit the classification. A new loss function is introduced to make the relative positions between each unseen image and its $k$ nearest neighbors in the feature space as consistent as possible with their relative positions in the semantic space; this loss function may be easily applied in most existing GAN-based models. Experimental results on five benchmark datasets show a significant improvement in accuracy compared with that of original models, especially in the GZSL setting.
Highlights
Supervised learning methods have achieved great success in the domain of machine learning
DATASETS AND EVALUATION PROTOCOL We evaluate our method on five popular benchmark datasets, i.e., Caltech-UCSD-Birds 200-2011 (CUB) [27], Oxford Flowers (FLO) [28], SUN Attribute (SUN) [29], Animals with Attributes (AWA) [30], and aPascal/aYahoo [31] for both zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) settings
We can see that the GT-GANf-CLSWGAN method improves the accuracy over the original f-CLSWGAN by 4.9%, 4.5%, 3.1%, 7.5% and 6.9% on CUB, FLO, SUN, AWA and aPY, respectively, and GT-generative adversarial networks (GANs) achieves 4.0%, 4.9%, 2.6%, 5.4%, and 5.8% improvements over LisGAN
Summary
Supervised learning methods have achieved great success in the domain of machine learning. Assisted by data generation models, such as GANs [5] and variational autoencoders (VAEs) [6], [21], many new synthesizing methods have been proposed to directly generate unseen samples from their semantic attributes, which can transform zero-shot learning to a conventional supervised learning problem. The specific training process for each epoch can be described as follows: For the first block, given dataset S and random noises z ∼ N (0, 1), we can leverage the generator G that utilizes the seen-class attribute vectors ts and z to generate fake seen image features xs = G(z, ts). We leverage the confident samples that are considered to be correctly classified to fine-tune the classification results as in [22]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.