DSAMR: Dual-Stream Attention Multi-hop Reasoning for knowledge-based visual question answering

Yanhan Sun,Zhenfang Zhu,Zicheng Zuo,Kefeng Li,Shuai Gong,Jiangtao Qi

doi:10.1016/j.eswa.2023.123092

Abstract

Knowledge-based visual question answering aims to associate external knowledge facts for answering questions about images. Most existing methods emphasize high-order associations between knowledge facts and questions, and fail to consider the negative effects of unnecessary knowledge facts in multi-hop reasoning. In this paper, we propose a Dual-Stream Attention Multi-hop Reasoning (DSAMR) architecture that constructs two different attention streams to mitigate unnecessary knowledge facts. This dual-stream mechanism enables the model to reduce the attention weights on unnecessary knowledge while gathering essential knowledge by learning the implicit correlations between knowledge facts and questions. In addition, we designed a hypergraph knowledge extraction module in the architecture to extract optimal knowledge facts by evaluating the relevance of each knowledge fact to the question. The experimental results demonstrate the effectiveness of our method not only on the knowledge-based visual question answering dataset KVQA, but also on the multi-hop question answering dataset PathQuestion.

Full Text