Abstract

Visual question answering stands among the most researched computer vision problems, pattern recognition, and natural language processing. VQA extends the computer vision world’s challenges and directs us toward developing some basic reasonings on visual scenes to answer questions on the specific elements, actions, and relationships between different objects in the image. Developing reasonings on the image has always been popular among computer vision and natural language processing researchers. It is directly dependent on the expressivity of the representations learned from the datasets. In the past decade, with advancements in computing machinery, neural networks, and the introduction of highly optimized and efficient software, a substantial amount of research has been done to solve VQA efficiently. In this survey, we present an in-depth examination of representation learning of state-of-the-art methods proposed in the literature of VQA and compare them to discuss the future directions in the field.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call