CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network

Yuxin Peng,Jinwei Qi,Yuxin Yuan,Xin Huang

doi:10.1109/tmm.2017.2742704

Abstract

Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on Deep Neural Network (DNN): The first learning stage is to generate separate representation for each modality, and the second learning stage is to get the cross-modal common representation. However, the existing methods have three limitations: (1) In the first learning stage, they only model intra-modality correlation, but ignore inter-modality correlation with rich complementary context. (2) In the second learning stage, they only adopt shallow networks with single-loss regularization, but ignore the intrinsic relevance of intra-modality and inter-modality correlation. (3) Only original instances are considered while the complementary fine-grained clues provided by their patches are ignored. For addressing the above problems, this paper proposes a cross-modal correlation learning (CCL) approach with multi-grained fusion by hierarchical network, and the contributions are as follows: (1) In the first learning stage, CCL exploits multi-level association with joint optimization to preserve the complementary context from intra-modality and inter-modality correlation simultaneously. (2) In the second learning stage, a multi-task learning strategy is designed to adaptively balance the intra-modality semantic category constraints and inter-modality pairwise similarity constraints. (3) CCL adopts multi-grained modeling, which fuses the coarse-grained instances and fine-grained patches to make cross-modal correlation more precise. Comparing with 13 state-of-the-art methods on 6 widely-used cross-modal datasets, the experimental results show our CCL approach achieves the best performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Feb 1, 2018
Citations: 202

Similar Papers

Show and Tell in the Loop: Cross-Modal Circular Correlation Learning
Yuxin Peng ... Jinwei Qi
IEEE Transactions on Multimedia | VOL. 21
Yuxin Peng, et. al.Yuxin Peng ... Jinwei Qi
01 Jun 2019
IEEE Transactions on Multimedia | VOL. 21

Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
Yi Yu ... Suhua Tang
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 15
Yi Yu, et. al.Yi Yu ... Suhua Tang
13 Feb 2019
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 15

Improving cross-modal correlation learning with hyperlinks
Shuhui Wang ... Qingming Huang
-
Shuhui Wang, et. al.Shuhui Wang ... Qingming Huang
01 Jun 2015
01 Jun 2015

CM-GANs
Yuxin Peng ... Jinwei Qi
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 15
Yuxin Peng, et. al.Yuxin Peng ... Jinwei Qi
07 Feb 2019
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia