Abstract

Executing natural language instructions in a physically grounded domain requires a model that understands both spatial concepts such as “left of” and “above”, and the compositional language used to identify landmarks and articulate instructions relative to them. In this paper, we study instruction understanding in the blocks world domain. Given an initial arrangement of blocks and a natural language instruction, the system executes the instruction by manipulating selected blocks. The highly compositional instructions are composed of atomic components and understanding these components is a necessary step to executing the instruction. We show that while end-to-end training (supervised only by the correct block location) fails to address the challenges of this task and performs poorly on instructions involving a single atomic component, knowledge-free auxiliary signals can be used to significantly improve performance by providing supervision for the instruction’s components. Specifically, we generate signals that aim at helping the model gradually understand components of the compositional instructions, as well as those that help it better understand spatial concepts, and show their benefit to the overall task for two datasets and two state-of-the-art (SOTA) models, especially when the training data is limited—which is usual in such tasks.

Highlights

  • Nents is a necessary step to executing the instruction

  • We show that while end-to-end training fails to address the challenges of this task and performs poorly on instructions involving a single atomic component, knowledge-free auxiliary signals can be used to significantly is a popular platform to study instruction understanding in physically grounded environments and presents several key reasoning challenges

  • We generate signals that aim at helping the model gradually understand components of the compositional instructions, as well as those that help it better understand spatial concepts, and show their benefit to the overall task for two datasets and two state-of-the-art (SOTA) models, especially when the training wasser, 2019; Tan and Bansal, 2018; Misra et al, 2017; Bisk et al, 2018)

Read more

Summary

Data-Augmentation

Most of the instructions involve several spatial relations and a high degree of compositionality. The hidden state is concatenated with the world state and passed through a fully-connected layer to solve the four-class classification problem Training on this auxiliary task jointly with the main task enables the model to learn absolute spatial concepts, such as, southeast. For the mean block-distance: euclidean distance between instruction Move the leftmost block ..., the model the ground truth and model prediction, normalized learns that all blocks are to the right of the source by the block length, and using accuracy for Tan block from the received feedback Quadrant Auxiliary Task: Aims at teaching the istic function of the world and the target/source model absolute spatial concepts like top right cor- location, and requires no extra supervision. The benefit of our approach is more pronounced for less training data

Understanding why augmentation helps
A Appendix
Findings
Quadrant Subset Filters
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call