CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

Alessandro Suglia,Oliver Lemon,Desmond Elliott,Stella Frank,Emanuele Bastianelli,Andrea Vanzo,Ioannis Konstas

doi:10.18653/v1/2020.acl-main.682

Abstract

Approaches to Grounded Language Learning are commonly focused on a single task-based final performance measure which may not depend on desirable properties of the learned hidden representations, such as their ability to predict object attributes or generalize to unseen situations. To remedy this, we present GroLLA, an evaluation framework for Grounded Language Learning with Attributes based on three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular with respect to attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with several attributes from resources such as VISA and ImSitu. We then compare several hidden state representations from current state-of-the-art approaches to Grounded Language Learning. By using diagnostic classifiers, we show that current models' learned representations are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%).

Highlights

Several grounded language learning tasks have been proposed to capture perceptual aspects of language (Shekhar et al, 2017; Hudson and Manning, 2019; Suhr et al, 2019; Agrawal et al, 2018)
Guesser accuracy We evaluate the GDSE and DeVries models in gameplay mode using the set of reference games provided in CompGuessWhat?!
Attribute Prediction We use the CompGuessWhat?! benchmark to compare several dialogue state representations: DeVries-SL: the representation learned by the Questioner model presented in that generates the question tokens conditioned on the image features and is trained using Supervised Learning (SL)

Summary

Introduction

Several grounded language learning tasks have been proposed to capture perceptual aspects of language (Shekhar et al, 2017; Hudson and Manning, 2019; Suhr et al, 2019; Agrawal et al, 2018). In is appliance has buttons is white microwave left-of Turn Question. Legend abstract situated oven is appliance has buttons is white the literature, several methods have been proposed to analyse what kind of information is captured by neural network representations (Kadar et al, 2017; Belinkov and Glass, 2019). Most of these works examine the hidden state representations learned by models trained on only textual data. Investigating whether the representations learned by a model exhibit forms of attribute composition is beneficial for assessing model interpretability and generalisation

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 44	License type: cc-by

Similar Papers

A Deep Representation Learning Framework for Medical Imaging Data Analysis
Pengcheng Xi
-
Pengcheng XiPengcheng Xi
24 Jun 2020
24 Jun 2020

A fusion representation for face learning by low-rank constrain and high-frequency texture components
Zexiao Liang ... Shaozhi Guo
Pattern Recognition Letters | VOL. 155
Zexiao Liang, et. al.Zexiao Liang ... Shaozhi Guo
01 Mar 2022
Pattern Recognition Letters | VOL. 155

Usr-mtl: an unsupervised sentence representation learning framework with multi-task learning
Wenshen Xu ... Shuangyin Li
Applied Intelligence | VOL. 51
Wenshen Xu, et. al.Wenshen Xu ... Shuangyin Li
14 Nov 2020
Applied Intelligence | VOL. 51

Neural scene graph rendering
Jonathan Granskog ... Till N Schnabel
ACM Transactions on Graphics | VOL. 40
Jonathan Granskog, et. al.Jonathan Granskog ... Till N Schnabel
19 Jul 2021
ACM Transactions on Graphics | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers