Abstract

Zero-shot learning (ZSL) is an effective method to perform the recognition task without any training samples of specific classes. Most existing ZSL models put emphasis on learning an embedding between visual space and semantic space directly. However, few ZSL models research whether the human-designed semantic features are discriminative enough to recognize different classes. Moreover, one-way mapping suffers from the project domain shift problem. In this article, we propose to learn a Discriminative Dual Semantic Auto-encoder (DDSA) based on the encoder-decoder paradigm to solve this problem. DDSA attempts to construct two bidirectional embeddings to connect the visual space and the semantic space with the help of the learned aligned space which includes discriminative information of the visual features and semantic features. Based on the DDSA, we additionally propose a Deep DDSA to capture deep aligned features that are more conducive to zero-shot classification. The key to the proposed framework is that it implicitly exact the principal information from visual space and semantic space to construct aligned features, which is not only semantic-preserving but also discriminative. Extensive experiments on five benchmarks (SUN, CUB, AWA1, AWA2 and aPY) demonstrate the effectiveness of the proposed framework with state-of-the-art performance obtained on both conventional ZSL and generalized ZSL settings.

Highlights

  • There are about 30,000 basic object categories and subordinate ones that human can recognize in the world

  • Few Zero-shot learning (ZSL) models research whether the humandesigned semantic features are discriminative enough to recognize different classes

  • Based on the Discriminative Dual Semantic Auto-encoder (DDSA), we propose a Deep DDSA to capture deep aligned features that are more conducive to zero-shot classification

Read more

Summary

Introduction

There are about 30,000 basic object categories and subordinate ones that human can recognize in the world. Human can even recognize new classes dynamically from few examples with little effort, but it is not easy for computer-based machine learning models that usually require thousands of labelled samples for training. Motivated by the ability of humans to recognize unseen examples, the research area of zero-shot learning (ZSL) has received increasing interests, which aims to make good use of previously learned knowledge to recognize new categories without the need for labelled training data. Test samples can be considered from both seen and unseen categories, which is called Generalized Zero-Shot Learning (GZSL). In real-world applications, seen categories are usually more common than unseen ones, the GZSL is more realistic and challenging than ZSL for practical recognition tasks

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call