Abstract

Cross-modal retrieval (i.e., image–query–text or text–query–image) is a hot research topic for multimedia information retrieval, but the heterogeneity gap between different modalities generates a critical challenge for multimodal data. Some researchers regard the cross-modal retrieval as a leaning to rank task, and they usually consider to measure similarity between two different modalities in the embedding shared subspace. However, previous methods almost pay more attention to construct a discriminative objective function to optimize common space, ignoring to exploit correlation between the single modality. In this paper, we consider the cross-modal retrieval task, from the perspective of optimizing ranking model, as a listwise ranking problem, and propose a novel method called learning to rank with relational graph and pointwise constraint ( $$ {\text{LR}}^{2} {\text{GP}} $$ ). In $$ {\text{LR}}^{2} {\text{GP}} $$ , we first propose a discriminative ranking model, which makes use of the relation between the single modality to improve ranking performance so as to learn an optimal embedding common subspace. Then, a pointwise constraint is introduced in the low-dimension embedding subspace to make up for the real loss in the training phase since listwise method introduced merely considers directly optimize latent permutation from the perspective of the overall. Finally, a dynamic interpolation algorithm, which gradually transits from pointwise and pairwise to listwise learning, is selected to deal with the problem of fusion of loss function reasonable. Experiments on the benchmark datasets about Wikipedia and Pascal demonstrate the effectiveness for proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.