Abstract
Zero-shot learning (ZSL) is an effective method to perform the recognition task without any training samples of specific classes. Most existing ZSL models put emphasis on learning an embedding between visual space and semantic space directly. However, few ZSL models research whether the human-designed semantic features are discriminative enough to recognize different classes. Moreover, one-way mapping suffers from the project domain shift problem. In this article, we propose to learn a Discriminative Dual Semantic Auto-encoder (DDSA) based on the encoder-decoder paradigm to solve this problem. DDSA attempts to construct two bidirectional embeddings to connect the visual space and the semantic space with the help of the learned aligned space which includes discriminative information of the visual features and semantic features. Based on the DDSA, we additionally propose a Deep DDSA to capture deep aligned features that are more conducive to zero-shot classification. The key to the proposed framework is that it implicitly exact the principal information from visual space and semantic space to construct aligned features, which is not only semantic-preserving but also discriminative. Extensive experiments on five benchmarks (SUN, CUB, AWA1, AWA2 and aPY) demonstrate the effectiveness of the proposed framework with state-of-the-art performance obtained on both conventional ZSL and generalized ZSL settings.
Highlights
There are about 30,000 basic object categories and subordinate ones that human can recognize in the world
Few Zero-shot learning (ZSL) models research whether the humandesigned semantic features are discriminative enough to recognize different classes
Based on the Discriminative Dual Semantic Auto-encoder (DDSA), we propose a Deep DDSA to capture deep aligned features that are more conducive to zero-shot classification
Summary
There are about 30,000 basic object categories and subordinate ones that human can recognize in the world. Human can even recognize new classes dynamically from few examples with little effort, but it is not easy for computer-based machine learning models that usually require thousands of labelled samples for training. Motivated by the ability of humans to recognize unseen examples, the research area of zero-shot learning (ZSL) has received increasing interests, which aims to make good use of previously learned knowledge to recognize new categories without the need for labelled training data. Test samples can be considered from both seen and unseen categories, which is called Generalized Zero-Shot Learning (GZSL). In real-world applications, seen categories are usually more common than unseen ones, the GZSL is more realistic and challenging than ZSL for practical recognition tasks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.