Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval

Mingyong Li,Qiqi Li,Degang Yang,Yan Ma

doi:10.1007/s40747-021-00615-3

Mingyong Li, Qiqi Li + Show 2 more

Open Access

https://doi.org/10.1007/s40747-021-00615-3

Copy DOI

Journal: Complex & Intelligent Systems	Publication Date: Jan 4, 2022
Citations: 4	License type: open-access

Affiliation: Chongqing Normal University

Abstract

With the vigorous development of mobile Internet technology and the popularization of smart devices, while the amount of multimedia data has exploded, its forms have become more and more diversified. People’s demand for information is no longer satisfied with single-modal data retrieval, and cross-modal retrieval has become a research hotspot in recent years. Due to the strong feature learning ability of deep learning, cross-modal deep hashing has been extensively studied. However, the similarity of different modalities is difficult to measure directly because of the different distribution and representation of cross-modal. Therefore, it is urgent to eliminate the modal gap and improve retrieval accuracy. Some previous research work has introduced GANs in cross-modal hashing to reduce semantic differences between different modalities. However, most of the existing GAN-based cross-modal hashing methods have some issues such as network training is unstable and gradient disappears, which affect the elimination of modal differences. To solve this issue, this paper proposed a novel Semantic-guided Autoencoder Adversarial Hashing method for cross-modal retrieval (SAAH). First of all, two kinds of adversarial autoencoder networks, under the guidance of semantic multi-labels, maximize the semantic relevance of instances and maintain the immutability of cross-modal. Secondly, under the supervision of semantics, the adversarial module guides the feature learning process and maintains the modality relations. In addition, to maintain the inter-modal correlation of all similar pairs, this paper use two types of loss functions to maintain the similarity. To verify the effectiveness of our proposed method, sufficient experiments were conducted on three widely used cross-modal datasets (MIRFLICKR, NUS-WIDE and MS COCO), and compared with several representatives advanced cross-modal retrieval methods, SAAH achieved leading retrieval performance.

Highlights

In recent years, with the widespread popularity of the Internet and mobile devices, the scale of multimodal data has increased dramatically. while the amount of multimedia data has exploded, its forms have become more and more diversified
Most of the existing GAN-based cross-modal retrieval methods mainly use the original GAN loss function and training strategy, which leads to the problems of unstable network training and gradients disappear, which affect the elimination of modal differences to a certain extent
From the results we can know that deep cross-modal methods achieve better performance than all the shallow hashing methods, our proposed SAAH is obviously superior to all of the comparative method

Summary

Introduction

With the widespread popularity of the Internet and mobile devices, the scale of multimodal data (text, image, video, audio, etc.) has increased dramatically. while the amount of multimedia data has exploded, its forms have become more and more diversified. Given a query image, it may be necessary to retrieve a set of text that best describes the image, or match the given text to a set of visually B Yan Ma. As the data of different modalities are heterogeneous and their distribution and presentation are inconsistent, the key to cross-modal retrieval is “modality gap”, that is, how to measure the similarity between different modal representations [8,29]. Image Cross-modal dataset player in red attempts a hit against two blocker s in blue and white , during an Olympic indoor vol leyball m atch A volleyball match , with players in prim ar ily red unifor ms , being played in front of a large crowd A male volleyball ... image Cross-modal dataset player in red attempts a hit against two blocker s in blue and white , during an Olympic indoor vol leyball m atch A volleyball match , with players in prim ar ily red unifor ms , being played in front of a large crowd A male volleyball

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complex & Intelligent Systems

Lead the way for us

Similar Papers

Deep semantics-preserving cross-modal hashing
Zhihui Lai ... Heng Kong
Big Data Research | VOL. -
Zhihui Lai, et. al.Zhihui Lai ... Heng Kong
01 Nov 2024
Big Data Research | VOL. -

A Cross-Modal Hash Retrieval Method with Fused Triples
Wenxiao Li ... Xiaorong Xue
Applied Sciences | VOL. 13
Wenxiao Li, et. al.Wenxiao Li ... Xiaorong Xue
21 Sep 2023
Applied Sciences | VOL. 13

A novel strategy to balance the results of cross-modal hashing
Fangming Zhong ... Feng Xia
Pattern Recognition | VOL. 107
Fangming Zhong, et. al.Fangming Zhong ... Feng Xia
01 Jul 2020
Pattern Recognition | VOL. 107

MLSCH: Multi-layer Semantic Constraints Hashing for Unsupervised Cross-modal Retrieval
Zhaomeng Wu ... Yang Yu
-
Zhaomeng Wu, et. al.Zhaomeng Wu ... Yang Yu
28 Apr 2023
28 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complex & Intelligent Systems