Modality-Balanced Models for Visual Dialogue

Hyounghun Kim,Hao Tan,Mohit Bansal

doi:10.1609/aaai.v34i05.6320

Abstract

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only looking at the image without any access to the context history, while others still need the conversation context to predict the correct answers. We demonstrate that due to this reason, previous joint-modality (history and image) models over-rely on and are more prone to memorizing the dialogue history (e.g., by extracting certain keywords or patterns in the context information), whereas image-only models are more generalizable (because they cannot memorize or extract keywords from history) and perform substantially better at the primary normalized discounted cumulative gain (NDCG) task metric which allows multiple correct answers. Hence, this observation encourages us to explicitly maintain two models, i.e., an image-only model and an image-history joint model, and combine their complementary abilities for a more balanced multimodal model. We present multiple methods for this integration of the two models, via ensemble and consensus dropout fusion with shared parameters. Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high balance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modality-Balanced Models for Visual Dialogue

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 22

Similar Papers

Ensemble of MRR and NDCG models for Visual Dialog

-

25 May 2021
25 May 2021

Ensemble of MRR and NDCG models for Visual Dialog
Idan Schwartz
-
Idan SchwartzIdan Schwartz
01 Jan 2020
01 Jan 2020

The wave concept inventory-a cognitive instrument based on Bloom's taxonomy
T.R Thoads ... R.J Roedel
-
T.R Thoads, et. al.T.R Thoads ... R.J Roedel
01 Nov 1999
01 Nov 1999

Exploring Contextual-Aware Representation and Linguistic-Diverse Expression for Visual Dialog
Xiangpeng Li ... Jingkuan Song
-
Xiangpeng Li, et. al.Xiangpeng Li ... Jingkuan Song
17 Oct 2021
17 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modality-Balanced Models for Visual Dialogue

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence