VQAR: Review on Information Retrieval Techniques based on Computer Vision and Natural Language Processing

Shivangi Modi,Dhatri Pandya

doi:10.1109/iccmc.2019.8819803

Abstract

Recently Computer vision and Natural language processing paradigm contains enormous research progress in their respective areas. Despite the progress in both areas, still it remains as a challenging task for machines to extract image semantics and then communicate this extracted information with the desired users. These problems will be solved by Visual Question Answering (VQA) system by connecting both computer vision and natural language processing paradigms. In VQA, system is presented with an image and textual question related to that image. The system will generate the answer by processing on both image and textual features. Answer generated by VQA is in one word, phrase or in sentence. Various datasets are available for training and evaluating VQA system which contains real or abstract images and question-answer pairs related to the semantics available in the image. VQA is being used in many areas such as for blind and visually impaired users, robotics, art gallery and many more areas. This paper discusses VQA techniques, VQA datasets and highlights the parametric evaluation of these techniques along with generic issues in VQA system.

Full Text