Abstract

Zero-shot learning (ZSL) in visual classification aims to recognize novel categories for which few or even no training samples are available. Through recent advances using generative adversarial networks (GANs) for cross-modal generation, several generative methods have been investigated for ZSL to classify unseen categories with synthetic samples. However, these GAN-based ZSL approaches still struggle to generate samples with semantic consistency and significant between-class discrepancy while preserving within-class diversity, which are vital to building classifiers for unseen classes. Accordingly, in this paper, we propose a robust dual-stream GAN to synthesize satisfactory samples for zero-shot visual classification. In more detail, the inter-class discrepancy is maximized by a backbone compatibility loss, which drives the center of the synthesized samples to move towards the center of real samples of the same class while moving further away from samples of different classes. Secondly, in order to preserve the intra-class diversity ignored by most extant paradigms, we propose a stochastic dispersion regularization to encourage the synthesized samples to be distributed at arbitrary points in the visual space of their categories. Finally, unlike previous methods that project visual samples back into semantic space and consequently cause an information degradation problem, we design a dual-stream generator to synthesize visual samples and reconstruct semantic embedding simultaneously, thereby ensuring semantic consistency. Our model outperforms the state-of-the-arts by 4.7% and 3.0% on average in two metrics over four real-world datasets, demonstrating its effectiveness and superiority.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call