Abstract

The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual navigation environment based on Google Street View; we further design an annotation method to highlight mined anchor landmarks and local directions between them in order to help annotators formulate typical, human references to those. The annotation task was crowdsourced on the AMT platform, to construct a new Talk2Nav dataset with 10, 714 routes. Our second contribution is a new learning method. Inspired by spatial cognition research on the mental conceptualization of navigational instructions, we introduce a soft dual attention mechanism defined over the segmented language instructions to jointly extract two partial instructions—one for matching the next upcoming visual landmark and the other for matching the local directions to the next landmark. On the similar lines, we also introduce spatial memory scheme to encode the local directional transitions. Our work takes advantage of the advance in two lines of research: mental formalization of verbal navigational instructions and training neural network agents for automatic way finding. Extensive experiments show that our method significantly outperforms previous navigation methods. For demo video, dataset and code, please refer to our project page.

Highlights

  • Consider that you are traveling as a tourist in a new city and are looking for a destination that you would like to visit

  • Inspired by the research on mental conceptualization of navigational instructions in spatial cognition (Tversky and Lee 1999; Michon and Denis 2001; Klippel and Winter 2005), we introduce a soft attention mechanism defined over the segmented language instructions to jointly extract two partial instructions—one for matching the coming visual landmark and the other for matching the spatial transition to the landmark

  • SPL↑ is used as the metric Boldness for the numbers in the tables signify that the corresponding row/method in the table gives the best performance among all other methods whole navigation instruction into landmark descriptions and local directional instructions, the attention map defined on language segments instead of English words, and the two clearly purposed matching modules make our method suitable for long-range vision-and-language navigation

Read more

Summary

Introduction

Consider that you are traveling as a tourist in a new city and are looking for a destination that you would like to visit. There is only one other work by Chen et al (2019) on natural language based outdoor navigation, which proposes an outdoor VLN dataset They have designed a great method for data annotation through gaming—to find a hidden object at the goal position, the method has difficulty to be applied to longer routes We develop an interactive visual navigation environment based on Google Street View, and more importantly design a novel annotation method which highlights selected landmarks and the spatial transitions in between This enhanced annotation method makes it feasible to crowdsource this complicated annotation task. The second challenge lies in training a long-range wayfinding agent This learning task requires accurate visual attention and language attention, accurate self-localization and a good sense of direction towards the goal.

Related Works
Talk2Nav Dataset
Data Collection
Directional Instruction Annotation
Landmark Mining
Annotation and Dataset Statistics
Approach
Route Finding Task
Language
Visual Observation
Spatial Memory
Matching Module
Action Module
Learning
Implementation Details
Experiments
Comparison to Prior Works
Methods
Analysis
Ablation Studies
Qualitative Analysis
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call