An Entropy Clustering Approach for Assessing Visual Question Difficulty

Kento Terao,Bisser Raytchev,Shin'Ichi Satoh,Kazufumi Kaneda,Toru Tamaki

doi:10.1109/access.2020.3022063

Kento Terao, Bisser Raytchev + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.3022063

Copy DOI

Abstract

We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multiple different VQA models. We propose to cluster the entropy values of the predicted answer distributions obtained by three different models: a baseline method that takes as input images and questions, and two variants that take as input images only and questions only. We use a simple k-means to cluster the visual questions of the VQA v2 validation set. Then we use state-of-the-art methods to determine the accuracy and the entropy of the answer distributions for each cluster. A benefit of the proposed method is that no annotation of the difficulty is required, because the accuracy of each cluster reflects the difficulty of visual questions that belong to it. Our approach can identify clusters of difficult visual questions that are not answered correctly by state-of-the-art methods. Detailed analysis on the VQA v2 dataset reveals that 1) all methods show poor performances on the most difficult cluster (about 10\% accuracy), 2) as the cluster difficulty increases, the answers predicted by the different methods begin to differ, and 3) the values of cluster entropy are highly correlated with the cluster accuracy. We show that our approach has the advantage of being able to assess the difficulty of visual questions without ground-truth (\ie, the test set of VQA v2) by assigning them to one of the clusters. We expect that this can stimulate the development of novel directions of research and new algorithms.

Highlights

Visual Question Answering (VQA) is one of the most challenging tasks in computer vision [1], [2]: given a pair of question text and image, a system is asked to answer the question
We propose to use the entropy values of answer predictions produced by different VQA models to evaluate the difficulty of visual questions for the models, in contrast to prior work [14] that uses the entropy of ground truth answers as a metric of diversity oragreement of annotations
CLUSTERING METHOD To perform clustering, we hypothesize that ‘‘easy visual questions lead to low entropy while difficult visual questions to high entropy.’’ A similar concept has been reported in terms of the human consensus with multiple ground truth annotations [13], but in this paper we address the relation between the difficulty and the entropy of answer distributions produced by VQA models

Summary

INTRODUCTION

Visual Question Answering (VQA) is one of the most challenging tasks in computer vision [1], [2]: given a pair of question text and image (a visual question), a system is asked to answer the question. We propose to use the entropy values of answer predictions produced by different VQA models to evaluate the difficulty of visual questions for the models, in contrast to prior work [14] that uses the entropy of ground truth answers as a metric of diversity or (dis)agreement of annotations. After training three different models (I, Q, and Q+I), predicting answer distributions and computing entropy values, the visual questions are clustered. This is simple yet useful, and enables us to find which visual questions are most difficult to answer. Our key insight is that the difficulty of visual question clusters is common to all methods, and tackling the difficult clusters may lead to the development of a generation of VQA methods

RELATED WORK

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 50	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Entropy Clustering Approach for Assessing Visual Question Difficulty

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Adversarial Sample Synthesis for Visual Question Answering
Chuanhao Li ... Zhen Li
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Chuanhao Li, et. al.Chuanhao Li ... Zhen Li
16 Sep 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal ... Tejas Khot
International Journal of Computer Vision | VOL. 127
Yash Goyal, et. al.Yash Goyal ... Tejas Khot
11 Sep 2018
International Journal of Computer Vision | VOL. 127

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Keze Wang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Keze Wang
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Rephrasing Visual Questions by Specifying the Entropy of the Answer Distribution
Kento Terao ... Toru Tamaki
IEICE Transactions on Information and Systems | VOL. E103.D
Kento Terao, et. al.Kento Terao ... Toru Tamaki
01 Nov 2020
IEICE Transactions on Information and Systems | VOL. E103.D

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Entropy Clustering Approach for Assessing Visual Question Difficulty

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access