Cross-Modal Retrieval via Similarity-Preserving Learning and Semantic Average Embedding

Tao Zhi,Yingchun Fan,Hong Han

doi:10.1109/access.2020.3044169

Abstract

Cross-modal retrieval takes one modality data as the query to search related data from different modalities (e.g. images vs. texts). As the heterogeneous gap exists between different media data, mainstream methods focus on reducing modality gap using common space learning. However, the heterogeneous media gap is big and it is too hard to be eliminated completely. Besides this, the representations of the same modality are diverse, which is important but is ignored by most existing methods. In this paper, we propose a novel cross-modal retrieval via Similarity-preserving Learning and Semantic Average Embedding (SLSAE) method. There are two key ideas in our method, one is to reduce modality gap by similarity-preserving learning, the other is to use semantic average embedding to weaken the impact of diversity existing in the common space. The similarity-preserving learning process will push embeddings from the same category together and pull embeddings from different categories apart. Eliminating the influence of embeddings diversity can improve performance and robustness, which is more friendly to real-world cross-modal retrieval applications. The model of proposed method is concise, and can be extended to multimodal retrieval situation flexibly. Comprehensive experimental results show that our method significantly outperforms the state-of-the-art methods in bimodal cross-modal retrieval, and it also achieves excellent performance in multimodal retrieval scenarios.

Highlights

With the development of Internet and digital media technology, we have entered the era of big data
Cross-modal retrieval has a wide range of applications, and it can be used in intelligence searching engine and multimedia data management
We propose a novel method which is called cross-modal retrieval via similarity-preserving learning and semantic average embedding (SLSAE) to address those issues mentioned above

Summary

INTRODUCTION

With the development of Internet and digital media technology, we have entered the era of big data. The previous cross-modal retrieval methods perform well, there are still two problems to be solved: 1) the big modality heterogeneous gap cannot be removed completely ideally, and 2) there is diversity existing in embeddings of the same modality, even though they belong to the same object. We propose a novel method which is called cross-modal retrieval via similarity-preserving learning and semantic average embedding (SLSAE) to address those issues mentioned above. The highlights of SLSAE framework is reducing modality gap by distance constraint, and the influence of embedding diversity existing in the common space is weakened with semantic average embedding. 1) A novel semantic similarity-preserving learning method is proposed, which tend to gather different modalities data in the common space.

RELATED WORK

FRAMEWORK OF SLSAE

SIMILARITY-PRESERVING LEARNING

SEMANTIC AVERAGE EMBEDDING FOR RETRIEVAL

EXTENSION TO MULTIMODAL RETRIEVAL

EXPERIMENTS

DATASETS AND FEATURES

Methods

Query Methods

FURTHER ANALYSIS ON SLSAE

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 38	License type: cc-by

R Discovery Prime

R Discovery Prime

Cross-Modal Retrieval via Similarity-Preserving Learning and Semantic Average Embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Clustering-driven Deep Adversarial Hashing for scalable unsupervised cross-modal retrieval
Xiao Shen ... Li Liu
Neurocomputing | VOL. 459
Xiao Shen, et. al.Xiao Shen ... Li Liu
29 Jun 2021
Neurocomputing | VOL. 459

A semantic model for cross-modal and multi-modal retrieval
Liang Xie ... Peng Pan
-
Liang Xie, et. al.Liang Xie ... Peng Pan
16 Apr 2013
16 Apr 2013

Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval
Lei Liao ... Bob Zhang
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 33
Lei Liao, et. al.Lei Liao ... Bob Zhang
01 Feb 2023
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 33

Deep-Learning-based Cross-Modal Luxury Microblogs Retrieval
Menghao Ma ... Wenhe Feng
-
Menghao Ma, et. al.Menghao Ma ... Wenhe Feng
11 Dec 2021
11 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-Modal Retrieval via Similarity-Preserving Learning and Semantic Average Embedding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access