Focal Visual-Text Attention for Memex Question Answering.

Junwei Liang,Liangliang Cao,Alexander G Hauptmann,Li-Jia Li,Yannis Kalantidis,Lu Jiang

doi:10.1109/tpami.2018.2890628

Abstract

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photo albums, we have to look at whole collections with sequences of photos. This paper proposes a new multimodal MemexQA task: given a sequence of photos from a user, the goal is to automatically answer questions that help users recover their memory about an event captured in these photos. In addition to a text answer, a few grounding photos are also given to justify the answer. The grounding photos are necessary as they help users quickly verifying the answer. Towards solving the task, we 1) present the MemexQA dataset, the first publicly available multimodal question answering dataset consisting of real personal photo albums; 2) propose an end-to-end trainable network that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. Experimental results on the MemexQA dataset demonstrate that our model outperforms strong baselines and yields the most relevant grounding photos on this challenging task.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 7, 2019
Citations: 131	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Focal Visual-Text Attention for Memex Question Answering.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Similar Papers

Focal Visual-Text Attention for Visual Question Answering
Junwei Liang ... Liangliang Cao
-
Junwei Liang, et. al.Junwei Liang ... Liangliang Cao
01 Jun 2018
01 Jun 2018

Neural Networks for Detecting Irrelevant Questions During Visual Question Answering
Mengdi Li ... Cornelius Weber
-
Mengdi Li, et. al.Mengdi Li ... Cornelius Weber
01 Jan 2020
01 Jan 2020

More Than An Answer
Yiyi Zhou ... Yunsheng Wu
-
Yiyi Zhou, et. al.Yiyi Zhou ... Yunsheng Wu
19 Oct 2017
19 Oct 2017

Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
Liyang Zhang ... Shuaicheng Liu
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32
Liyang Zhang, et. al.Liyang Zhang ... Shuaicheng Liu
17 Sep 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Focal Visual-Text Attention for Memex Question Answering.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence