Abstract

Cross-modal retrieval aims at retrieving relevant information across different media data, such as images and text. However, inconsistent distribution of different modal data causes a heterogeneity gap, which becomes an intractable issue. Previous researches aim at finding a semantic share space for various modalities, in which the similarity of them can be calculated by a uniform measurement. Graph representation learning has shown excellent performance in many tasks, like information retrieval. However, it is difficult to construct the most proper cross-modal graph which can better present the semantics of multi-modal data. Besides, the projected representations of various modal data optimized by the supervised class labels and unsupervised graph context learning are insufficiently aligned. To address these issues, this paper presents an Adversarial Pre-optimized Graph Representation Learning (AP-GRL) for cross-modal retrieval. Similar to mainstream methods, AP-GRL aims to learn a shared representation for various modalities. We design an adaptive pre-optimization mechanism to optimize the cross-modal data and their relationships, which is different from previously fixed graph. The raw constructed cross-modal graph is optimized by maximum posterior probability estimation. To fully exploit the graph, we proposed a double-order sampling strategy to take the width and depth into representation learning. Besides, we adopt adversarial learning with Wasserstein distance to reduce domain discrepancy in the domain-invariant representation learning. Extensive experiments on five widely used datasets demonstrate that AP-GRL outperforms state-of-the-art cross-modal retrieval methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call