Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson,Lei Zhang,Damien Teney,Xiaodong He,Chris Buehler,Mark Johnson,Stephen Gould

doi:10.1109/cvpr.2018.00636

Abstract

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement
Yijun Yan ... Jin Zhan
Pattern Recognition | VOL. 79
Yijun Yan, et. al.Yijun Yan ... Jin Zhan
05 Feb 2018
Pattern Recognition | VOL. 79

Brain Dynamics of Distractibility: Interaction Between Top-Down and Bottom-Up Mechanisms of Auditory Attention
Aurélie Bidet-Caulet ... Olivier Bertrand
Brain Topography | VOL. 28
Aurélie Bidet-Caulet, et. al.Aurélie Bidet-Caulet ... Olivier Bertrand
15 Feb 2014
Brain Topography | VOL. 28

Attentional dynamics during free picture viewing: Evidence from oculomotor behavior and electrocortical activity
Thomas Fischer ... Sven-Thomas Graupner
Frontiers in Systems Neuroscience | VOL. 7
Thomas Fischer, et. al.Thomas Fischer ... Sven-Thomas Graupner
01 Jan 2013
Frontiers in Systems Neuroscience | VOL. 7

A Reinforcement-Learning Model of Top-Down Attention Based on a Potential-Action Map
Dimitri Ognibene ... Christian Balkenius
-
Dimitri Ognibene, et. al.Dimitri Ognibene ... Christian Balkenius
01 Jul 2008
01 Jul 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Abstract

Talk to us

Similar Papers