Abstract

Most zero-shot learning (ZSL) methods based on generative adversarial networks (GANs) utilize random noise and semantic descriptions to synthesize image features of unseen classes, which alleviates the problem of training data imbalance between seen and unseen classes. However, these methods usually only learn the distributions of seen classes in the training stage, ignoring the unseen ones. Due to the different distributions of seen and unseen samples, i.e., image features, these methods cannot generate unseen features of sufficient quality, so the performances are also limited, especially for the generalized zero-shot learning (GZSL) setting. In this article, we propose a general transductive method based on GANs, called GT-GAN, which can improve the quality of generated unseen image features and therefore benefit the classification. A new loss function is introduced to make the relative positions between each unseen image and its $k$ nearest neighbors in the feature space as consistent as possible with their relative positions in the semantic space; this loss function may be easily applied in most existing GAN-based models. Experimental results on five benchmark datasets show a significant improvement in accuracy compared with that of original models, especially in the GZSL setting.

Highlights

  • Supervised learning methods have achieved great success in the domain of machine learning

  • DATASETS AND EVALUATION PROTOCOL We evaluate our method on five popular benchmark datasets, i.e., Caltech-UCSD-Birds 200-2011 (CUB) [27], Oxford Flowers (FLO) [28], SUN Attribute (SUN) [29], Animals with Attributes (AWA) [30], and aPascal/aYahoo [31] for both zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) settings

  • We can see that the GT-GANf-CLSWGAN method improves the accuracy over the original f-CLSWGAN by 4.9%, 4.5%, 3.1%, 7.5% and 6.9% on CUB, FLO, SUN, AWA and aPY, respectively, and GT-generative adversarial networks (GANs) achieves 4.0%, 4.9%, 2.6%, 5.4%, and 5.8% improvements over LisGAN

Read more

Summary

INTRODUCTION

Supervised learning methods have achieved great success in the domain of machine learning. Assisted by data generation models, such as GANs [5] and variational autoencoders (VAEs) [6], [21], many new synthesizing methods have been proposed to directly generate unseen samples from their semantic attributes, which can transform zero-shot learning to a conventional supervised learning problem. The specific training process for each epoch can be described as follows: For the first block, given dataset S and random noises z ∼ N (0, 1), we can leverage the generator G that utilizes the seen-class attribute vectors ts and z to generate fake seen image features xs = G(z, ts). We leverage the confident samples that are considered to be correctly classified to fine-tune the classification results as in [22]

FULL ALGORITHM PROCEDURE
6: Calculate the loss LD by Eq 6 and the gradient of
13: Calculate the full loss function LGf by Eq 7 and the
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call