Learning to rank with relational graph and pointwise constraint for cross-modal retrieval

Qingzhen Xu,Mengjing Yu,Miao Li

doi:10.1007/s00500-018-3608-9

Qingzhen Xu, Mengjing Yu + Show 1 more

https://doi.org/10.1007/s00500-018-3608-9

Copy DOI

Export

Save

Cite

Journal: Soft Computing	Publication Date: Nov 9, 2018
Citations: 13

Affiliation: South China Normal University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Cross-modal retrieval (i.e., image–query–text or text–query–image) is a hot research topic for multimedia information retrieval, but the heterogeneity gap between different modalities generates a critical challenge for multimodal data. Some researchers regard the cross-modal retrieval as a leaning to rank task, and they usually consider to measure similarity between two different modalities in the embedding shared subspace. However, previous methods almost pay more attention to construct a discriminative objective function to optimize common space, ignoring to exploit correlation between the single modality. In this paper, we consider the cross-modal retrieval task, from the perspective of optimizing ranking model, as a listwise ranking problem, and propose a novel method called learning to rank with relational graph and pointwise constraint ( $$ {\text{LR}}^{2} {\text{GP}} $$ ). In $$ {\text{LR}}^{2} {\text{GP}} $$ , we first propose a discriminative ranking model, which makes use of the relation between the single modality to improve ranking performance so as to learn an optimal embedding common subspace. Then, a pointwise constraint is introduced in the low-dimension embedding subspace to make up for the real loss in the training phase since listwise method introduced merely considers directly optimize latent permutation from the perspective of the overall. Finally, a dynamic interpolation algorithm, which gradually transits from pointwise and pairwise to listwise learning, is selected to deal with the problem of fusion of loss function reasonable. Experiments on the benchmark datasets about Wikipedia and Pascal demonstrate the effectiveness for proposed method.

Full Text