NLP Meets Vision for Visual Interpretation - A Retrospective Insight and Future directions

Ahmed Jamshed,Muhammad Moazam Fraz

doi:10.1109/icodt252288.2021.9441517

Abstract

Recent advances in the field of NLP (Natural Language Processing) and CV (Computer Vision) have sparked a lot of curiosity among researchers to test the limitations of latest Deep learning techniques by employing them in more complex AI tasks. One such kind of task is VQA (Visual Question Answering) which is inherently divided into many layers of complexities. Some questions are simple having obvious answers while some are more complex which need logical reasoning, common sense and factual knowledge. Starting simple and gradually incorporating complexity, is always a good idea in scientific research and development. At first, datasets were simpler consisting of simple question-answer pairs with images depicting simpler concepts and relatively naive VQA models were trained on them. Slowly, with time, the VQA datasets got more complicated and tangled demanding more cognitive capabilities from VQA models. This evolution pushed the VQA models to be more efficient in matching human cognitive abilities, using reasoning based on common sense and factual knowledge. In this survey, we will first discuss some of the famous datasets in the domain of VQA and then we will discuss some of the crucial advancements in the VQA architectures and what is currently being done for integrating common sense and knowledge into these models. Moreover, reasoning is very crucial for truly intelligent systems but representations in deep learning models are inherently very fuzzy and vague. We need models that can transparently generate reasoning about their predictions like old school expert systems which used to work on symbolic knowledge, so the architectures based on the amalgam of deep learning techniques and Symbolic representations would also be a part of our discussion. We will also shed some light on the impact of transformers in the field of deep learning and how these transformer based models are quickly becoming state-of-the-art in almost every deep learning task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NLP Meets Vision for Visual Interpretation - A Retrospective Insight and Future directions

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal ... Douglas Summers-Stay
International Journal of Computer Vision | VOL. 127
Yash Goyal, et. al.Yash Goyal ... Douglas Summers-Stay
11 Sep 2018
International Journal of Computer Vision | VOL. 127

Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
Gabriel Grand ... Yonatan Belinkov
-
Gabriel Grand, et. al.Gabriel Grand ... Yonatan Belinkov
01 Jan 2019
01 Jan 2019

Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo ... Yibing Liu
-
Yangyang Guo, et. al.Yangyang Guo ... Yibing Liu
18 Jul 2019
18 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NLP Meets Vision for Visual Interpretation - A Retrospective Insight and Future directions

Abstract

Talk to us

Similar Papers