Abstract

Cross-modal retrieval takes one modality data as the query to search related data from different modalities (e.g. images vs. texts). As the heterogeneous gap exists between different media data, mainstream methods focus on reducing modality gap using common space learning. However, the heterogeneous media gap is big and it is too hard to be eliminated completely. Besides this, the representations of the same modality are diverse, which is important but is ignored by most existing methods. In this paper, we propose a novel cross-modal retrieval via Similarity-preserving Learning and Semantic Average Embedding (SLSAE) method. There are two key ideas in our method, one is to reduce modality gap by similarity-preserving learning, the other is to use semantic average embedding to weaken the impact of diversity existing in the common space. The similarity-preserving learning process will push embeddings from the same category together and pull embeddings from different categories apart. Eliminating the influence of embeddings diversity can improve performance and robustness, which is more friendly to real-world cross-modal retrieval applications. The model of proposed method is concise, and can be extended to multimodal retrieval situation flexibly. Comprehensive experimental results show that our method significantly outperforms the state-of-the-art methods in bimodal cross-modal retrieval, and it also achieves excellent performance in multimodal retrieval scenarios.

Highlights

  • With the development of Internet and digital media technology, we have entered the era of big data

  • Cross-modal retrieval has a wide range of applications, and it can be used in intelligence searching engine and multimedia data management

  • We propose a novel method which is called cross-modal retrieval via similarity-preserving learning and semantic average embedding (SLSAE) to address those issues mentioned above

Read more

Summary

INTRODUCTION

With the development of Internet and digital media technology, we have entered the era of big data. The previous cross-modal retrieval methods perform well, there are still two problems to be solved: 1) the big modality heterogeneous gap cannot be removed completely ideally, and 2) there is diversity existing in embeddings of the same modality, even though they belong to the same object. We propose a novel method which is called cross-modal retrieval via similarity-preserving learning and semantic average embedding (SLSAE) to address those issues mentioned above. The highlights of SLSAE framework is reducing modality gap by distance constraint, and the influence of embedding diversity existing in the common space is weakened with semantic average embedding. 1) A novel semantic similarity-preserving learning method is proposed, which tend to gather different modalities data in the common space.

RELATED WORK
FRAMEWORK OF SLSAE
SIMILARITY-PRESERVING LEARNING
SEMANTIC AVERAGE EMBEDDING FOR RETRIEVAL
EXTENSION TO MULTIMODAL RETRIEVAL
EXPERIMENTS
DATASETS AND FEATURES
Methods
Query Methods
FURTHER ANALYSIS ON SLSAE
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.