Learning Two-Branch Neural Networks for Image-Text Matching Tasks.

Liwei Wang,Jing Huang,Svetlana Lazebnik,Yin Li

doi:10.1109/tpami.2018.2797921

Abstract

Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 24, 2018
Citations: 488	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Learning Two-Branch Neural Networks for Image-Text Matching Tasks.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Similar Papers

Network embedding by fusing multimodal contents and links
Feiran Huang ... Zhoujun Li
Knowledge-Based Systems | VOL. 171
Feiran Huang, et. al.Feiran Huang ... Zhoujun Li
14 Feb 2019
Knowledge-Based Systems | VOL. 171

Deep Attributed Network Embedding by Preserving Structure and Attribute Information
Richang Hong ... Yong Ge
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 51
Richang Hong, et. al.Richang Hong ... Yong Ge
01 Mar 2021
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 51

Multimodal Co-Attention Mechanism for One-stage Visual Grounding
Zhihan Yu ... Mingcong Lu
-
Zhihan Yu, et. al.Zhihan Yu ... Mingcong Lu
26 Nov 2022
26 Nov 2022

Enhanced Network Embedding with Text Information
Shuang Yang ... Bo Yang
-
Shuang Yang, et. al.Shuang Yang ... Bo Yang
01 Aug 2018
01 Aug 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Two-Branch Neural Networks for Image-Text Matching Tasks.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence