Abstract

Zero-shot learning (ZSL) for visual recognition aims to accurately recognize the objects of unseen classes through mapping the visual feature to an embedding space spanned by class semantic information. However, the semantic gap across visual features and their underlying semantics is still a big obstacle in ZSL. Conventional ZSL methods construct that the mapping typically focus on the original visual features that are independent of the ZSL tasks, thus degrading the prediction performance. In this paper, we propose an effective method to uncover an appropriate latent representation of data for the purpose of zero-shot classification. Specifically, we formulate a novel framework to jointly learn the latent subspace and cross-modal embedding to link visual features with their semantic representations. The proposed framework combines feature learning and semantics prediction, such that the learned data representation is more discriminative to predict the semantic vectors, hence improving the overall classification performance. To learn a robust latent subspace, we explicitly avoid the information loss by ensuring the reconstruction ability of the obtained data representation. An efficient algorithm is designed to solve the proposed optimization problem. To fully exploit the intrinsic geometric structure of data, we develop a manifold regularization strategy to refine the learned semantic representations, leading to further improvements of the classification performance. To validate the effectiveness of the proposed approach, extensive experiments are conducted on three ZSL benchmarks and encouraging results are achieved compared with the state-of-the-art ZSL methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.