BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

Minjun Kim,Youhan Lee,Kyungtae Lim,Haneol Jang,Seungwoo Song

doi:10.1609/aaai.v38i16.29798

Abstract

The current research direction in generative models, such as the recently developed GPT4, aims to find relevant knowledge information for multimodal and multilingual inputs to provide answers. Under these research circumstances, the demand for multilingual evaluation of visual question answering (VQA) tasks, a representative task of multimodal systems, has increased. Accordingly, we propose a bilingual outside-knowledge VQA (BOK-VQA) dataset in this study that can be extended to multilingualism. The proposed data include 17K images, 17K question-answer pairs for both Korean and English and 280K instances of knowledge information related to question-answer content. We also present a framework that can effectively inject knowledge information into a VQA system by pretraining the knowledge information of BOK-VQA data in the form of graph embeddings. Finally, through in-depth analysis, we demonstrated the actual effect of the knowledge information contained in the constructed training data on VQA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing
Shivangi Modi ... Dhatri Pandya
-
Shivangi Modi, et. al.Shivangi Modi ... Dhatri Pandya
01 Mar 2019
01 Mar 2019

Visual Question Answering Using Deep Learning: A Survey and Performance Analysis
Yash Srivastava ... Vaishnav Murali
-
Yash Srivastava, et. al.Yash Srivastava ... Vaishnav Murali
01 Jan 2020
01 Jan 2020

Counting in Visual Question Answering: Methods, Datasets, and Future Work
Tesfayee Meshu Welde ... Lejian Liao
International Journal of Image and Graphics | VOL. -
Tesfayee Meshu Welde, et. al.Tesfayee Meshu Welde ... Lejian Liao
20 Oct 2023
International Journal of Image and Graphics | VOL. -

Medical knowledge-based network for Patient-oriented Visual Question Answering
Jian Huang ... Wenyin Liu
Information Processing & Management | VOL. 60
Jian Huang, et. al.Jian Huang ... Wenyin Liu
21 Dec 2022
Information Processing & Management | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence