Learning visual and textual representations for multimodal matching and classification

Yu Liu,Li Liu,Yanming Guo,Michael S Lew

doi:10.1016/j.patcog.2018.07.001

Abstract

Multimodal learning has been an important and challenging problem for decades, which aims to bridge the modality gap between heterogeneous representations, such as vision and language. Unlike many current approaches which only focus on either multimodal matching or classification, we propose a unified network to jointly learn multimodal matching and classification (MMC-Net) between images and texts. The proposed MMC-Net model can seamlessly integrate the matching and classification components. It first learns visual and textual embedding features in the matching component, and then generates discriminative multimodal representations in the classification component. Combining the two components in a unified model can help in improving their performance. Moreover, we present a multi-stage training algorithm by minimizing both of the matching and classification loss functions. Experimental results on four well-known multimodal benchmarks demonstrate the effectiveness and efficiency of the proposed approach, which achieves competitive performance for multimodal matching and classification compared to state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Pattern Recognition	Publication Date: Jul 2, 2018
Citations: 27	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Learning visual and textual representations for multimodal matching and classification

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Similar Papers

Secure and flexible certificate access in WS-security through LDAP component matching
Sang Seok Lim ... Kurt D Zeilenga
-
Sang Seok Lim, et. al.Sang Seok Lim ... Kurt D Zeilenga
01 Jan 2004
01 Jan 2004

Research and improvement of multi-mode matching algorithm based on KR-BMSH
Yansen Zhou ... Jinlin Zeng
-
Yansen Zhou, et. al.Yansen Zhou ... Jinlin Zeng
01 Oct 2016
01 Oct 2016

Multimodal image matching based on Multimodality Robust Line Segment Descriptor
Chunyang Zhao ... Bo Li
Neurocomputing | VOL. 177
Chunyang Zhao, et. al.Chunyang Zhao ... Bo Li
03 Dec 2015
Neurocomputing | VOL. 177

High Camouflage Intrusion Detection Method for Structured Database Based on Multi Pattern Matching
Dawei Song ... Xun Zhu
Journal of Physics: Conference Series | VOL. 2066
Dawei Song, et. al.Dawei Song ... Xun Zhu
01 Nov 2021
Journal of Physics: Conference Series | VOL. 2066

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning visual and textual representations for multimodal matching and classification

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition