Abstract
Zero Shot Learning (ZSL) has been attracting increasing attention due to its powerful ability of recognizing objects of unseen classes. As one type of ZSL methods, the low rank based strategy has achieved remarkable success. However, traditional low rank based methods are often based on the assumption that a variety of visual features from a same class can be projected to a single attribute by ignoring the background information and other noisy interference in visual features. This assumption is unreasonable and often leads to bad performance when there is big variance within a class. In this paper, a novel method called Prototype Relaxation with Robust Principal Component Analysis (RPCA) is proposed to relax this assumption by adding a sparse noise constraint. In addition, to avoid the confusion between similar classes, an orthogonal constraint is employed to disperse all the class prototypes, including both seen and unseen classes, in latent space. Furthermore, to alleviate the domain shift problem, vectors from latent space are exploited to reconstruct visual features and semantic attributes respectively. Besides, the hubness problem is also mitigated by applying the max probability model in all three spaces. Extensive experiments are conducted on four popular datasets and the results demonstrate the superiority of this method.
Highlights
INTRODUCTIONDue to the rapid development of deep learning, the technique of image classification has gained remarkable achievement [1], such as Deep Residual Network (ResNet) [2] can achieve over 95% top-5 accuracy on the 1000-category image dataset — ImageNet [3], which has been proven to surpass the recognition ability of human beings
Due to the rapid development of deep learning, the technique of image classification has gained remarkable achievement [1], such as Deep Residual Network (ResNet) [2] can achieve over 95% top-5 accuracy on the 1000-category image dataset — ImageNet [3], which has been proven to surpass the recognition ability of human beings. These image classification methods can only survive on the close set setting, when there comes a sample of new category which did not appear in the training set, they will definitely get wrong result
The contributions of our work can be summarized as follows, 1) To avoid the unreasonable projection assumption in traditional low rank based Zero Shot Learning (ZSL) methods, we proposed a prototype relaxation with Robust Principal Component Analysis (RPCA) based approach to relax it by adding a sparse item to incorporate the noisy redundant information; 2) An orthogonal constraint in latent space is employed to disperse all class prototypes by making them normalized and orthogonal to each other; 3) Self reconstruction is applied to alleviate the domain shift problem, and a probabilistic prediction in all three spaces are utilized to further mitigate the hubness problem; 4) Experiments for both inductive setting and transductive setting are conducted on four popular datasets, and the results on both ZSL and Generalized ZSL (GZSL) demonstrate the superiority of the proposed method
Summary
Due to the rapid development of deep learning, the technique of image classification has gained remarkable achievement [1], such as Deep Residual Network (ResNet) [2] can achieve over 95% top-5 accuracy on the 1000-category image dataset — ImageNet [3], which has been proven to surpass the recognition ability of human beings. The contributions of our work can be summarized as follows, 1) To avoid the unreasonable projection assumption in traditional low rank based ZSL methods, we proposed a prototype relaxation with RPCA based approach to relax it by adding a sparse item to incorporate the noisy redundant information; 2) An orthogonal constraint in latent space is employed to disperse all class prototypes by making them normalized and orthogonal to each other; 3) Self reconstruction is applied to alleviate the domain shift problem, and a probabilistic prediction in all three spaces are utilized to further mitigate the hubness problem; 4) Experiments for both inductive setting and transductive setting are conducted on four popular datasets, and the results on both ZSL and GZSL demonstrate the superiority of the proposed method. There are many other methods developed for this more realistic setting [28], [29]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.