Vision-Language Navigation Policy Learning and Adaptation.

Xin Wang,Lei Zhang,Jianfeng Gao,William Yang Wang,Qiuyuan Huang,Asli Celikyilmaz,Yuan-Fang Wang,Dinghan Shen

doi:10.1109/tpami.2020.2972281

Abstract

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL). Particularly, a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories, and a reasoning navigator is employed to perform cross-modal grounding in the local visual scene. Evaluation on a VLN benchmark dataset shows that our RCM model significantly outperforms baseline methods by 10 percent on Success Rate weighted by Path Length (SPL) and achieves the state-of-the-art performance. To improve the generalizability of the learned policy, we further introduce a Self-Supervised Imitation Learning (SIL) method to explore and adapt to unseen environments by imitating its own past, good decisions. We demonstrate that SIL can approximate a better and more efficient policy, which tremendously minimizes the success rate performance gap between seen and unseen environments (from 30.7 to 11.7 percent).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Vision-Language Navigation Policy Learning and Adaptation.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Feb 7, 2020
Citations: 7

Similar Papers

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Xin Wang ... Lei Zhang
-
Xin Wang, et. al.Xin Wang ... Lei Zhang
26 Nov 2018
26 Nov 2018

Self-supervised reinforcement learning-based energy management for a hybrid electric vehicle
Chunyang Qi ... Yiwen Zhu
Journal of Power Sources | VOL. 514
Chunyang Qi, et. al.Chunyang Qi ... Yiwen Zhu
01 Dec 2021
Journal of Power Sources | VOL. 514

Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant Phenotyping.
Franklin C Ogidi ... Ian Stavness
Plant phenomics (Washington, D.C.) | VOL. 5
Franklin C Ogidi, et. al.Franklin C Ogidi ... Ian Stavness
01 Jan 2023
Plant phenomics (Washington, D.C.) | VOL. 5

Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey.
Longlong Jing ... Yingli Tian
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 43
Longlong Jing, et. al.Longlong Jing ... Yingli Tian
04 May 2020
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vision-Language Navigation Policy Learning and Adaptation.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence