Abstract

Abstract Heterogeneity of multi-modal data is the key challenge for multimedia cross-modal retrieval. To solve this challenge, many approaches have been developed. As the mainstream, subspace learning based approaches focus on learning a latent shared subspace to measure similarities between cross-modal data, and have shown their remarkable performance in practical cross-modal retrieval tasks. However, most of the existing approaches are intrinsically identified with feature dimension reduction on different modalities in a shared subspace, unable to fundamentally resolve the heterogeneity issue well; therefore they often can not obtain satisfactory results as expected. As claimed in Hilbert space theory, different Hilbert spaces with the same dimension are isomorphic. Based on this premise, isomorphic mapping subspaces can be considered as a single space shared by multi-modal data. To this end, we in this paper propose a correlation-based cross-modal subspace learning model via kernel dependence maximization (KDM). Unlike most of the existing correlation-based subspace learning methods, the proposed KDM learns subspace representation for each modality by maximizing the kernel dependence (correlation) instead of directly maximizing the feature correlations between multi-modal data. Specifically, we first map multi-modal data into different Hilbert spaces but with the same dimension individually, then we calculate kernel matrix in each Hilbert space and measure the correlations between multi-modalities based on kernels. Experimental results have shown the effectiveness and competitiveness of the proposed KDM against the compared classic subspace learning approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call