Referring Expression Comprehension Via Enhanced Cross-modal Graph Attention Networks

Jia Wang,Yung-Hui Li,Hong-Han Shuai,Jingcheng Ke,Wen-Huang Cheng

doi:10.1145/3548688

Abstract

Referring expression comprehension aims to localize a specific object in an image according to a given language description. It is still challenging to comprehend and mitigate the gap between various types of information in the visual and textual domains. Generally, it needs to extract the salient features from a given expression and match the features of expression to an image. One challenge in referring expression comprehension is the number of region proposals generated by object detection methods is far more than the number of entities in the corresponding language description. Remarkably, the candidate regions without described by the expression will bring a severe impact on referring expression comprehension. To tackle this problem, we first propose a novel Enhanced Cross-modal Graph Attention Networks (ECMGANs) that boosts the matching between the expression and the entity position of an image. Then, an effective strategy named Graph Node Erase (GNE) is proposed to assist ECMGANs in eliminating the effect of irrelevant objects on the target object. Experiments on three public referring expression comprehension datasets show unambiguously that our ECMGANs framework achieves better performance than other state-of-the-art methods. Moreover, GNE is able to obtain higher accuracies of visual-expression matching effectively.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Referring Expression Comprehension Via Enhanced Cross-modal Graph Attention Networks

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Feb 6, 2023
Citations: 3

Similar Papers

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
Zhenfang Chen ... Kwan-Yee K Wong
-
Zhenfang Chen, et. al.Zhenfang Chen ... Kwan-Yee K Wong
01 Jun 2020
01 Jun 2020

Exploring Logical Reasoning for Referring Expression Comprehension
Ying Cheng ... Rui-Wei Zhao
-
Ying Cheng, et. al.Ying Cheng ... Rui-Wei Zhao
17 Oct 2021
17 Oct 2021

A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension
Yue Liao ... Guanbin Li
-
Yue Liao, et. al.Yue Liao ... Guanbin Li
01 Jun 2020
01 Jun 2020

Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks
Peng Wang ... Qi Wu
-
Peng Wang, et. al.Peng Wang ... Qi Wu
01 Jun 2019
01 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Referring Expression Comprehension Via Enhanced Cross-modal Graph Attention Networks

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications