Visual Question Answering with Textual Representations for Images

Yusuke Hirota,Mayu Otani,Ittetsu Taniguchi,Takao Onoye,Chenhui Chu,Yuta Nakashima,Noa Garcia

doi:10.1109/iccvw54120.2021.00353

Visual Question Answering with Textual Representations for Images

Yusuke Hirota, Mayu Otani + Show 5 more

https://doi.org/10.1109/iccvw54120.2021.00353

Copy DOI

Publication Date: Oct 1, 2021

Citations: 4

Affiliation: Osaka University, CyberAgent (Japan), Kyoto University

#Visual Question Answering #Deep Visual Features + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

How far can we go with textual representations for understanding pictures? Deep visual features extracted by object recognition models are prevailing used in multiple tasks, and especially in visual question answering (VQA). However, conventional deep visual features may struggle to convey all the details in an image as we humans do. Mean-while, with recent language models’ progress, descriptive text may be an alternative to this problem. This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.