Generalization in Instruction Following Systems

Soham Dan,Dan Roth,Michael Zhou

doi:10.18653/v1/2021.naacl-main.76

Abstract

Understanding and executing natural language instructions in a grounded domain is one of the hallmarks of artificial intelligence. In this paper, we focus on instruction understanding in the blocks world domain and investigate the language understanding abilities of two top-performing systems for the task. We aim to understand if the test performance of these models indicates an understanding of the spatial domain and of the natural language instructions relative to it, or whether they merely over-fit spurious signals in the dataset. We formulate a set of expectations one might have from an instruction following model and concretely characterize the different dimensions of robustness such a model should possess. Despite decent test performance, we find that state-of-the-art models fall short of these expectations and are extremely brittle. We then propose a learning strategy that involves data augmentation and show through extensive experiments that the proposed learning strategy yields models that are competitive on the original test set while satisfying our expectations much better.

Highlights

1 Introduction et al, 2014) and in certain natural language tasks
We find that state-of-the-art models fall short of these expectations and are extremely brittle
The space of perturbations that we consider have the following attributes: (a) Semantic Preserving or Semantic Altering. (b) Linguistic or Geometric. (c) Discrete or Continuous. We find that both models studied suffer a large performance drop under each of the perturbations, and fall short of satisfying our Despite the success of top performing models (Tan and Bansal, 2018; Bisk et al, 2016) on the test set for this task, we question if the models are able to reason about the complex language and spatial concepts of this task and generalize or are merely over fitting the test set

Summary

Robustness to Expectations

Given the block configuration W ∈ R20×3 (threeon slightly perturbing the input. dimensional coordinate locations of a maximum (2) Symmetry Equivariance Expectation: A of 20 unlabeled blocks B = b1, ..., b20 and an insymmetric transformation of an input should cause struction I, the model has to move the appropriate an equivalent transformation of model prediction block. While the target out- concepts we adversarially pick the one with the put is always a location y ∈ R3, for the source highest loss over all combinations of substitutions task the model can either predict a particular block from the synonyms in C. If the perturbation space is discrete and finite we can simple search over all candidate (I , W ) to find the one with the maximum loss If it is continuous and infinite, we can use a first order method (eg: First Order Gradient Signed Method FGSM (Goodfellow et al, 2014)) to find the adversarial (I , W ). 3according to a FGSM attack with = 0.05 4Addition of such distractor blocks at locations far from the source and target locations, form a similar perturbation set that leads to a significant performance drop for existing models (Appendix A).

Adversarial Data Augmentation

Results

Conclusion

A Appendix A

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generalization in Instruction Following Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Generalization in Instruction Following Systems
...
-
, et. al. ...
25 May 2021
25 May 2021

Generating machine-executable plans from end-user's natural-language instructions
Rui Liu ... Xiaoli Zhang
Knowledge-Based Systems | VOL. 140
Rui Liu, et. al.Rui Liu ... Xiaoli Zhang
01 Nov 2017
Knowledge-Based Systems | VOL. 140

Resource-constrained compaction of sequential circuit test sets
S.K Bommu ... S.T Chakradhar
-
S.K Bommu, et. al.S.K Bommu ... S.T Chakradhar
04 Jan 2000
04 Jan 2000

Assessment of a deep learning model for COVID-19 classification on chest radiographs: a comparison across image acquisition techniques and clinical factors.
Mena Shenouda ... Isabella Flerlage
Journal of medical imaging (Bellingham, Wash.) | VOL. 10
Mena Shenouda, et. al.Mena Shenouda ... Isabella Flerlage
28 Dec 2023
Journal of medical imaging (Bellingham, Wash.) | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generalization in Instruction Following Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers