Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Wei Suo,Mengyang Sun,Qi Wu,Peng Wang

doi:10.24963/ijcai.2021/143

Abstract

Referring Expression Comprehension (REC) has become one of the most important tasks in visual reasoning, since it is an essential step for many vision-and-language tasks such as visual question answering. However, it has not been widely used in many downstream tasks because it suffers 1) two-stage methods exist heavy computation cost and inevitable error accumulation, and 2) one-stage methods have to depend on lots of hyper-parameters (such as anchors) to generate bounding box. In this paper, we present a proposal-free one-stage (PFOS) model that is able to regress the region-of-interest from the image, based on a textual query, in an end-to-end manner. Instead of using the dominant anchor proposal fashion, we directly take the dense-grid of image as input for a cross-attention transformer that learns grid-word correspondences. The final bounding box is predicted directly from the image without the time-consuming anchor selection process that previous methods suffer. Our model achieves the state-of-the-art performance on four referring expression datasets with higher efficiency, comparing to previous best one-stage and two-stage methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Individual Participant Data Meta-Analysis for a Binary Outcome: One-Stage or Two-Stage?
Thomas P A Debray ... Karel G M Moons
PLoS ONE | VOL. 8
Thomas P A Debray, et. al.Thomas P A Debray ... Karel G M Moons
09 Apr 2013
PLoS ONE | VOL. 8

Exploiting the Social-Like Prior in Transformer for Visual Reasoning
Yudong Han ... Xuemeng Song
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Yudong Han, et. al.Yudong Han ... Xuemeng Song
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

REAGENTS AND METHODS
-
Acta Medica Scandinavica | VOL. 157
--
12 Jan 1957
Acta Medica Scandinavica | VOL. 157

A Proposal-Free One-Stage Framework for Referring Expression Comprehension and Generation via Dense Cross-Attention
Mengyang Sun ... Yanning Zhang
IEEE Transactions on Multimedia | VOL. 25
Mengyang Sun, et. al.Mengyang Sun ... Yanning Zhang
01 Jan 2023
IEEE Transactions on Multimedia | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Abstract

Talk to us

Similar Papers