Abstract
Visual Question Answering (VQA) is an emerging AI research problem that combines computer vision, natural language processing, knowledge representation & reasoning (KR). Given image and question related to the image as input, it requires analysis of visual components of the image, type of question, and common sense or general knowledge to predict the right answer. VQA is useful in different real-time applications like blind person assistance, autonomous driving, solving trivial tasks like spotting empty tables in hotels, parks, or picnic places, etc. Since its introduction in 2014, many researchers have worked and applied different techniques for Visual question answering. Also, different datasets have been introduced. This paper presents an overview of available datasets and evaluation metrices used in the VQA area. Further paper presents different techniques used in the VQA domain. Techniques are categorized based on the mechanism used. Based on the detailed discussion and performance comparison we discuss various challenges in the VQA domain and provide directions for future work.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Next-Generation Computing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.