Ensemble of MRR and NDCG models for Visual Dialog

Idan Schwartz

doi:10.18653/v1/2021.naacl-main.262

Abstract

Assessing an AI agent that can converse in human language and understand visual content is challenging. Generation metrics, such as BLEU scores favor correct syntax over semantics. Hence a discriminative approach is often used, where an agent ranks a set of candidate options. The mean reciprocal rank (MRR) metric evaluates the model performance by taking into account the rank of a single human-derived answer. This approach, however, raises a new challenge: the ambiguity and synonymy of answers, for instance, semantic equivalence (e.g., ‘yeah’ and ‘yes’). To address this, the normalized discounted cumulative gain (NDCG) metric has been used to capture the relevance of all the correct answers via dense annotations. However, the NDCG metric favors the usually applicable uncertain answers such as ‘I don’t know.’ Crafting a model that excels on both MRR and NDCG metrics is challenging. Ideally, an AI agent should answer a human-like reply and validate the correctness of any answer. To address this issue, we describe a two-step non-parametric ranking approach that can merge strong MRR and NDCG models. Using our approach, we manage to keep most MRR state-of-the-art performance (70.41% vs. 71.24%) and the NDCG state-of-the-art performance (72.16% vs. 75.35%). Moreover, our approach won the recent Visual Dialog 2020 challenge. Source code is available at https://github.com/idansc/mrr-ndcg.

Highlights

1. can't tell it's covered in cloth 2. it appears to be a large red pillow that may be leather
Prior works focus on optimizing a single lowing, we describe two steps: (i) the mean reciprocal rank (MRR) step metric (Guo et al, 2019; Jiang et al, 2020; Hu responsible for preserving the human-derived rank high, and (ii) the normalized discounted cumulative gain (NDCG) step responsible for rank- MRR model is not certain
When the NDCG model and the MRR model agree that a candidate is likely to be correct, it implies that both the NDCG and MRR metrics gain by ranking this candidate high

Summary

Related Work

Visual conversation evaluation: Early attempts to marry conversation with vision used street scene images, and binary questions (Geman et al, 2015). A different approach suggested in the VQA dataset focus only on brief, mostly 1-word answers (Antol et al, 2015). Schwartz et al (2019b) propose a model, namely Factor Graph Attention (FGA), that lets all entities (e.g., question-words, image-regions, answer-candidate, and caption-words) interact to infer an attention map for each modality. Murahari et al (2020) recently propose Large-Scale(LS) model, which pre-trains on related vision-language datasets, e.g., Conceptual Captions and Visual Question Answering(Sharma et al, 2018; Antol et al, 2015). Prior works focus on optimizing a single lowing, we describe two steps: (i) the MRR step metric (Guo et al, 2019; Jiang et al, 2020; Hu responsible for preserving the human-derived rank high, and (ii) the NDCG step responsible for rank- MRR model is not certain. We we add the MRR-models’ answer at first retrieval

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ensemble of MRR and NDCG models for Visual Dialog

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

Similar Papers

Ensemble of MRR and NDCG models for Visual Dialog

-

25 May 2021
25 May 2021

An efficient page ranking approach based on vector norms using sNorm(p) algorithm
Shubham Goel ... Vikram Chopra
Information Processing and Management | VOL. 56
Shubham Goel, et. al.Shubham Goel ... Vikram Chopra
01 Mar 2019
Information Processing and Management | VOL. 56

Structured learning for non-smooth ranking losses
Soumen Chakrabarti ... Rajiv Khanna
-
Soumen Chakrabarti, et. al.Soumen Chakrabarti ... Rajiv Khanna
24 Aug 2008
24 Aug 2008

UniLoc: Unified Fault Localization of Continuous Integration Failures
Foyzul Hassan ... Na Meng
ACM Transactions on Software Engineering and Methodology | VOL. 32
Foyzul Hassan, et. al.Foyzul Hassan ... Na Meng
28 Sep 2023
ACM Transactions on Software Engineering and Methodology | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble of MRR and NDCG models for Visual Dialog

Abstract

Highlights

Summary

Talk to us

Similar Papers