Abstract

Most cross-modal retrieval methods based on subspace learning just focus on learning the projection matrices that map different modalities to a common subspace and pay less attention to the retrieval task specificity and class information. To address the two limitations and make full use of unlabelled data, we propose a novel semi-supervised method for cross-modal retrieval named modal-related retrieval based on discriminative comapping (MRRDC). The projection matrices are obtained to map multimodal data into a common subspace for different tasks. In the process of projection matrix learning, a linear discriminant constraint is introduced to preserve the original class information in different modal spaces. An iterative optimization algorithm based on label propagation is presented to solve the proposed joint learning formulations. The experimental results on several datasets demonstrate the superiority of our method compared with state-of-the-art subspace methods.

Highlights

  • In real applications, data are often represented in different ways or obtained from various domains

  • We observe that our method outperforms its counterparts. is may be because the projection matrices preserve more discriminative class information via semi-supervised learning. e common subspace of our method is more discriminative and effective by further exploiting the class semantic of intramodality and intermodality similarity simultaneously

  • We find that, in most cases, generalized multiview MFA (GMMFA), generalized multiview LDA (GMLDA), MDCR, and modal-related retrieval based on discriminative comapping (MRRDC) always perform better than partial least squares (PLS), canonical correlation analysis (CCA), SM, and SCM, and images with CNN features have superiority compared with the shallow features. This is because PLS, CCA, SM, and SCM only use pairwise information, but the other approaches add class information to their objective functions, which provides better separation between different categories in the latent common subspace

Read more

Summary

Introduction

Data are often represented in different ways or obtained from various domains. The data with the same semantic may exist in different modalities or exhibit heterogeneous properties. With the rapid growth of multimodal data, there is an urgent need for effectively analyzing the data obtained from different modalities [1,2,3,4,5]. There is much attention to the multimodal analysis, the most common method is to ensemble the multimodal data to improve the performance [6,7,8,9]. Cross-modal retrieval is an efficient way to achieve data from different modal data. E typical example is to take the image as a query to retrieve related texts (I2T) or to search images by utilizing the textual description (T2I).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call