Abstract

Visual Question Answering (VQA) is a research topic of significant interest at the intersection of computer vision and natural language understanding. Recent research indicates that attributes and knowledge can effectively improve performance for both image captioning and VQA. In this article, an inner knowledge-based Img2Doc algorithm for VQA is presented. The inner knowledge is characterized as the inner attribute relationship in visual images. In addition to using an attribute network for inner knowledge-based image representation, VQA scheme is associated with a question-guided Doc2Vec method for question–answering. The attribute network generates inner knowledge-based features for visual images, while a novel question-guided Doc2Vec method aims at converting natural language text to vector features. After the vector features are extracted, they are combined with visual image features into a classifier to provide an answer. Based on our model, the VQA problem is resolved by textual question answering. The experimental results demonstrate that the proposed method achieves superior performance on multiple benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call