Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

Xiwen Liang,Xiaodan Liang,Bing Wang,Bingqian Lin,Yi Zhu,Fengda Zhu

doi:10.1609/aaai.v36i2.20050

Abstract

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction. Previous works learn to navigate step-by-step following an instruction. However, these works may fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions. These problems hinder agents from learning distinctive vision-and-language representations, harming the robustness and generalizability of the navigation policy. In this paper, we propose a Contrastive Instruction-Trajectory Learning (CITL) framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation. Specifically, we propose: (1) a coarse-grained contrastive learning objective to enhance vision-and-language representations by contrasting semantics of full trajectory observations and instructions, respectively; (2) a fine-grained contrastive learning objective to perceive instructions by leveraging the temporal information of the sub-instructions; (3) a pairwise sample-reweighting mechanism for contrastive learning to mine hard samples and hence mitigate the influence of data sampling bias in contrastive learning. Our CITL can be easily integrated with VLN backbones to form a new learning paradigm and achieve better generalizability in unseen environments. Extensive experiments show that the model with CITL surpasses the previous state-of-the-art methods on R2R, R4R, and RxR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 6

Similar Papers

Parity-based robust data-driven fault detection for nonlinear systems using just-in-time learning approach
Muhammad Asim Abbasi ... Ghulam Mustafa
Transactions of the Institute of Measurement and Control | VOL. 42
Muhammad Asim Abbasi, et. al.Muhammad Asim Abbasi ... Ghulam Mustafa
08 Jan 2020
Transactions of the Institute of Measurement and Control | VOL. 42

Vision-and-Dialog Navigation by Fusing Cross-modal features
Hongxu Nie ... Min Dong
-
Hongxu Nie, et. al.Hongxu Nie ... Min Dong
18 Jul 2021
18 Jul 2021

Adaptive virtual metrology method based on Just-in-time reference and particle filter for semiconductor manufacturing
Haoshu Cai ... Jay Lee
Measurement | VOL. 168
Haoshu Cai, et. al.Haoshu Cai ... Jay Lee
12 Aug 2020
Measurement | VOL. 168

The role of cognitive factors on the development and evolution of the vocabulary

-

01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence