Abstract

Cross modal (e.g., text-to-image or image-to-text) retrieval has received great attention with the flushed multi-modal social media data. It is of considerable challenge to stride across the heterogeneous gap between modalities. Existing methods project different modalities into a common space by minimizing the distance within the heterogeneous pairs (intra-pair) of the new latent space. However, the relationship among these multi-modal pairs (inter-pair) are neglected, which are beneficial to eliminate the heterogeneity. In this paper, we propose a novel algorithm based on canonical correlation analysis by considering the high-order relationship among pairs (HCCA) for cross-modal retrieval. Supervised with additional semantic labels and unsupervised without semantic labels are simultaneously considered by treating the intra- and inter-pair correlation discriminatively. Moreover, kernel tricks are also performed on HCCA to learn a non-linear projection, termed HKCCA. Extensive experiments conducted on three public datasets demonstrate the superiority of the proposed methods compared with the state-of-the-art approaches in cross modal retrieval.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call