Self-Organizing Memory Based on Adaptive Resonance Theory for Vision and Language Navigation

Wansen Wu,Kai Xu,Yue Hu,Quanjun Yin,Long Qin

doi:10.3390/math11194192

Abstract

Vision and Language Navigation (VLN) is a task in which an agent needs to understand natural language instructions to reach the target location in a real-scene environment. To improve the model ability of long-horizon planning, emerging research focuses on extending the models with different types of memory structures, mainly including topological maps or a hidden state vector. However, the fixed-length hidden state vector is often insufficient to capture long-term temporal context. In comparison, topological maps have been shown to be beneficial for many robotic navigation tasks. Therefore, we focus on building a feasible and effective topological map representation and using it to improve the navigation performance and the generalization across seen and unseen environments. This paper presents a S elf-organizing Memory based on Adaptive Resonance Theory (SMART) module for incremental topological mapping and a framework for utilizing the SMART module to guide navigation. Based on fusion adaptive resonance theory networks, the SMART module can extract salient scenes from historical observations and build a topological map of the environmental layout. It provides a compact spatial representation and supports the discovery of novel shortcuts through inferences while being explainable in terms of cognitive science. Furthermore, given a language instruction and on top of the topological map, we propose a vision–language alignment framework for navigational decision-making. Notably, the framework utilizes three off-the-shelf pre-trained models to perform landmark extraction, node–landmark matching, and low-level controlling, without any fine-tuning on human-annotated datasets. We validate our approach using the Habitat simulator on VLN-CE tasks, which provides a photo-realistic environment for the embodied agent in continuous action space. The experimental results demonstrate that our approach achieves comparable performance to the supervised baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Organizing Memory Based on Adaptive Resonance Theory for Vision and Language Navigation

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Journal: Mathematics	Publication Date: Oct 7, 2023
License type: CC BY 4.0

Similar Papers

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler
Tsu-Jui Fu ... Xin Eric Wang
-
Tsu-Jui Fu, et. al.Tsu-Jui Fu ... Xin Eric Wang
01 Jan 2020
01 Jan 2020

Diagnosing the Environment Bias in Vision-and-Language Navigation
Yubo Zhang ... Mohit Bansal
-
Yubo Zhang, et. al.Yubo Zhang ... Mohit Bansal
24 Dec 2019
24 Dec 2019

Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur ... Ivan Laptev
-
Pierre-Louis Guhur, et. al.Pierre-Louis Guhur ... Ivan Laptev
01 Oct 2021
01 Oct 2021

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
Muhammad Zubair Irshad ... Zsolt Kira
-
Muhammad Zubair Irshad, et. al.Muhammad Zubair Irshad ... Zsolt Kira
30 May 2021
30 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Organizing Memory Based on Adaptive Resonance Theory for Vision and Language Navigation

Abstract

Talk to us

Similar Papers

More From: Mathematics