Abstract
Understanding and executing natural language instructions in a grounded domain is one of the hallmarks of artificial intelligence. In this paper, we focus on instruction understanding in the blocks world domain and investigate the language understanding abilities of two top-performing systems for the task. We aim to understand if the test performance of these models indicates an understanding of the spatial domain and of the natural language instructions relative to it, or whether they merely over-fit spurious signals in the dataset. We formulate a set of expectations one might have from an instruction following model and concretely characterize the different dimensions of robustness such a model should possess. Despite decent test performance, we find that state-of-the-art models fall short of these expectations and are extremely brittle. We then propose a learning strategy that involves data augmentation and show through extensive experiments that the proposed learning strategy yields models that are competitive on the original test set while satisfying our expectations much better.
Highlights
1 Introduction et al, 2014) and in certain natural language tasks
We find that state-of-the-art models fall short of these expectations and are extremely brittle
The space of perturbations that we consider have the following attributes: (a) Semantic Preserving or Semantic Altering. (b) Linguistic or Geometric. (c) Discrete or Continuous. We find that both models studied suffer a large performance drop under each of the perturbations, and fall short of satisfying our Despite the success of top performing models (Tan and Bansal, 2018; Bisk et al, 2016) on the test set for this task, we question if the models are able to reason about the complex language and spatial concepts of this task and generalize or are merely over fitting the test set
Summary
Given the block configuration W ∈ R20×3 (threeon slightly perturbing the input. dimensional coordinate locations of a maximum (2) Symmetry Equivariance Expectation: A of 20 unlabeled blocks B = b1, ..., b20 and an insymmetric transformation of an input should cause struction I, the model has to move the appropriate an equivalent transformation of model prediction block. While the target out- concepts we adversarially pick the one with the put is always a location y ∈ R3, for the source highest loss over all combinations of substitutions task the model can either predict a particular block from the synonyms in C. If the perturbation space is discrete and finite we can simple search over all candidate (I , W ) to find the one with the maximum loss If it is continuous and infinite, we can use a first order method (eg: First Order Gradient Signed Method FGSM (Goodfellow et al, 2014)) to find the adversarial (I , W ). 3according to a FGSM attack with = 0.05 4Addition of such distractor blocks at locations far from the source and target locations, form a similar perturbation set that leads to a significant performance drop for existing models (Appendix A).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.