Abstract
Vision-and-language navigation is a newly emerging research topic developing rapidly in recent years, and it is one of the representative research tasks in the frontier field of vision-language interaction. The goal of this task is to realize autonomous navigation based on visual perception of environment according to language instructions given by human. This paper reviews the recent progress in vision-and-language navigation. Firstly, the research content of this task is introduced, and the three main problems and challenges of cross-modal semantic alignments, semantic understanding and reasoning, and generalization ability enhancement are analyzed. Secondly, commonly-used datasets and evaluation metrics are listed. Thirdly, the research progress of this task is summarized from four aspects of imitation learning, reinforcement learning, self-supervised learning and other methods, and the effects of the typical solutions are carefully compared and analyzed. Fourthly, the current research trends of this task are discussed, which mainly include continuous environment navigation, advanced complex instruction comprehension and common sense reasoning. Finally, the future development directions such as 3D visual-and-language navigation, embodied question answering and interactive question answering are further discussed and prospected.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Computer-Aided Design & Computer Graphics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.