Similarity and diversity induced paired projection for cross-modal retrieval

Jinxing Li,Mu Li,Guangming Lu,Bob Zhang,Hongpeng Yin,David Zhang

doi:10.1016/j.ins.2020.06.032

Abstract

The heterogeneous gap among cross modalities is a critical problem in many applications (e.g., retrieval). Considering that the main purpose of cross-modal learning is to learn a common representation while there also exist specific components across different modalities, a similarity and diversity induced paired projection (SDPP) method is proposed in this paper. SDPP not only extracts the correlation in a common subspace, but also removes the view-specific information which does not contribute to our task. In order to model the specific components, the Hilbert Schmidt Independence Criterion (HSIC) is introduced as a co-regularization to explicitly enforce the diversity. Additionally, different from some existing subspace learning methods which are time consuming in the testing phase, a paired projection strategy is exploited, being capable of obtaining the similar information in a simple but effective way. To optimize the presented approach, an efficient algorithm is designed to update different variables alternatively. Finally, we apply our strategy to the cross-modal retrieval, and experimental results on several real-world datasets substantiate the effectiveness and superiority of our model compared with other state-of-the-art methods.

Full Text