Abstract

With the vigorous development of mobile Internet technology and the popularization of smart devices, while the amount of multimedia data has exploded, its forms have become more and more diversified. People’s demand for information is no longer satisfied with single-modal data retrieval, and cross-modal retrieval has become a research hotspot in recent years. Due to the strong feature learning ability of deep learning, cross-modal deep hashing has been extensively studied. However, the similarity of different modalities is difficult to measure directly because of the different distribution and representation of cross-modal. Therefore, it is urgent to eliminate the modal gap and improve retrieval accuracy. Some previous research work has introduced GANs in cross-modal hashing to reduce semantic differences between different modalities. However, most of the existing GAN-based cross-modal hashing methods have some issues such as network training is unstable and gradient disappears, which affect the elimination of modal differences. To solve this issue, this paper proposed a novel Semantic-guided Autoencoder Adversarial Hashing method for cross-modal retrieval (SAAH). First of all, two kinds of adversarial autoencoder networks, under the guidance of semantic multi-labels, maximize the semantic relevance of instances and maintain the immutability of cross-modal. Secondly, under the supervision of semantics, the adversarial module guides the feature learning process and maintains the modality relations. In addition, to maintain the inter-modal correlation of all similar pairs, this paper use two types of loss functions to maintain the similarity. To verify the effectiveness of our proposed method, sufficient experiments were conducted on three widely used cross-modal datasets (MIRFLICKR, NUS-WIDE and MS COCO), and compared with several representatives advanced cross-modal retrieval methods, SAAH achieved leading retrieval performance.

Highlights

  • In recent years, with the widespread popularity of the Internet and mobile devices, the scale of multimodal data has increased dramatically. while the amount of multimedia data has exploded, its forms have become more and more diversified

  • Most of the existing GAN-based cross-modal retrieval methods mainly use the original GAN loss function and training strategy, which leads to the problems of unstable network training and gradients disappear, which affect the elimination of modal differences to a certain extent

  • From the results we can know that deep cross-modal methods achieve better performance than all the shallow hashing methods, our proposed SAAH is obviously superior to all of the comparative method

Read more

Summary

Introduction

With the widespread popularity of the Internet and mobile devices, the scale of multimodal data (text, image, video, audio, etc.) has increased dramatically. while the amount of multimedia data has exploded, its forms have become more and more diversified. Given a query image, it may be necessary to retrieve a set of text that best describes the image, or match the given text to a set of visually B Yan Ma. As the data of different modalities are heterogeneous and their distribution and presentation are inconsistent, the key to cross-modal retrieval is “modality gap”, that is, how to measure the similarity between different modal representations [8,29]. Image Cross-modal dataset player in red attempts a hit against two blocker s in blue and white , during an Olympic indoor vol leyball m atch A volleyball match , with players in prim ar ily red unifor ms , being played in front of a large crowd A male volleyball ... image Cross-modal dataset player in red attempts a hit against two blocker s in blue and white , during an Olympic indoor vol leyball m atch A volleyball match , with players in prim ar ily red unifor ms , being played in front of a large crowd A male volleyball

Objectives
Methods
Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.