Abstract

Due to the rapid growth of multimodal data, cross-modal retrieval has drawn growing attention in recent years, which aims to take one type of data as the query to retrieve relevant data of another type. To enable directly matching between different modalities, the key issue in cross-modal retrieval is to eliminate the heterogeneity between modalities. A bundle of existing approaches directly project the samples of multimodal data into a common latent subspace with the supervision of class label information, and different samples within the same class contribute uniformly to the subspace construction. However, the subspace constructed by these methods may not reveal the true importance of each sample as well as the discrimination of different class label. To tackle this problem, in this paper we regard different modalities as different domains and propose a Domain Invariant Subspace Learning (DISL) method to associate multimodal data. Specifically, DISL simultaneously minimize the classification error with sample-wise weighting coefficients and preserve the structure similarity within and across modalities with the graph regularization. Therefore, the subspace learned by DISL can well reflect the sample-wise importance and capture the discrimination of different class labels in multi-modal data. Compared with several state-of-the-art algorithms, extensive experiments on three public datasets demonstrate the superiority of the proposed method for cross-modal retrieval tasks such as image-to-text and text-to-image.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call