Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

Lu Jin,Jinhui Tang,Zechao Li

doi:10.1109/tnnls.2020.2997020

Abstract

Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable image-text and video-text retrieval. The proposed deep hashing framework leverages 2-D convolutional neural networks (CNN) as the backbone network to capture the spatial information for image-text retrieval, while the 3-D CNN as the backbone network to capture the spatial and temporal information for video-text retrieval. In the DSMHN, two sets of modality-specific hash functions are jointly learned by explicitly preserving both intermodality similarities and intramodality semantic labels. Specifically, with the assumption that the learned hash codes should be optimal for the classification task, two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes. Moreover, a unified deep multimodal hashing framework is proposed to learn compact and high-quality hash codes by exploiting the feature representation learning, intermodality similarity-preserving learning, semantic label-preserving learning, and hash function learning with different types of loss functions simultaneously. The proposed DSMHN method is a generic and scalable deep hashing framework for both image-text and video-text retrievals, which can be flexibly integrated with different types of loss functions. We conduct extensive experiments for both single-modal- and cross-modal-retrieval tasks on four widely used multimodal-retrieval data sets. Experimental results on both image-text- and video-text-retrieval tasks demonstrate that the DSMHN significantly outperforms the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Jun 5, 2020
Citations: 56

Similar Papers

Learning Hash Codes with Listwise Supervision
Jun Wang ... Andy X Sun
-
Jun Wang, et. al.Jun Wang ... Andy X Sun
01 Dec 2013
01 Dec 2013

MOON: Multi-hash codes joint learning for cross-media retrieval
Donglin Zhang ... Josef Kittler
Pattern Recognition Letters | VOL. 151
Donglin Zhang, et. al.Donglin Zhang ... Josef Kittler
01 Nov 2021
Pattern Recognition Letters | VOL. 151

Scalable Multimedia Retrieval by Deep Learning Hashing with Relative Similarity Learning
Lianli Gao ... Dongxiang Zhang
-
Lianli Gao, et. al.Lianli Gao ... Dongxiang Zhang
13 Oct 2015
13 Oct 2015

Jointly Multiple Hash Learning
Xingbo Liu ... Yilong Yin
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Xingbo Liu, et. al.Xingbo Liu ... Yilong Yin
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems