Joint embedding VQA model based on dynamic word vector.

Zhiyang Ma,Xiaobing Chen,Wenfeng Zheng,Lirong Yin

doi:10.7717/peerj-cs.353

Abstract

The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use static word vectors for text characterization. However, in the real language environment, the same word may represent different meanings in different contexts, and may also be used as different grammatical components. These differences cannot be effectively expressed by static word vectors, so there may be semantic and grammatical deviations. In order to solve this problem, our article constructs a joint embedding model based on dynamic word vector—none KB-Specific network (N-KBSN) model which is different from commonly used Visual Question Answering models based on static word vectors. The N-KBSN model consists of three main parts: question text and image feature extraction module, self attention and guided attention module, feature fusion and classifier module. Among them, the key parts of N-KBSN model are: image characterization based on Faster R-CNN, text characterization based on ELMo and feature enhancement based on multi-head attention mechanism. The experimental results show that the N-KBSN constructed in our experiment is better than the other 2017—winner (glove) model and 2019—winner (glove) model. The introduction of dynamic word vector improves the accuracy of the overall results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ Computer Science	Publication Date: Mar 3, 2021
Citations: 127	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Joint embedding VQA model based on dynamic word vector.

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Similar Papers

BPCN:A simple and efficient model for visual question answering
Feng Yan ... Yachang Chai
-
Feng Yan, et. al.Feng Yan ... Yachang Chai
01 Dec 2022
01 Dec 2022

Multiscale Feature Extraction and Fusion of Image and Text in VQA
Siyu Lu ... Yueming Ding
International Journal of Computational Intelligence Systems | VOL. 16
Siyu Lu, et. al.Siyu Lu ... Yueming Ding
11 Apr 2023
International Journal of Computational Intelligence Systems | VOL. 16

Accuracy vs. complexity: A trade-off in visual question answering models
Moshiur Farazi ... Nick Barnes
Pattern Recognition | VOL. 120
Moshiur Farazi, et. al.Moshiur Farazi ... Nick Barnes
12 Jun 2021
Pattern Recognition | VOL. 120

Chinese Named Entity Recognition for Hazard And Operability Analysis Text
Fangguo Li ... Beike Zhang
-
Fangguo Li, et. al.Fangguo Li ... Beike Zhang
01 Aug 2020
01 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint embedding VQA model based on dynamic word vector.

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science