Abstract

Recently, research in robotics community shows a trend that the future intelligent robots should be endowed with the capacities to understand the environment and navigate to the goal location through the communications with their human users. Such tasks usually require the smart agents to process multi-modal information effectively. Though multi-modal information processing has been long studied, the problem of how to effectively fuse different modalities of information remains challenging. In this paper, we focus on the vision-and-dialog navigation(NDH) task. The NDH task is proposed for building dialog-enabled agents which can find a path to the goal location in unexplored environments by inferring navigation actions based on the dialog history and the visual inputs. We first investigate the problem about what role the visual features are playing in NDH task. We observe the same trend which was observed in the vision-and-language Navigation(VLN) task. The conclusion is using different levels of visual features affects the model performance seriously. Particularly, using low-level visual features makes the agent models hardly generalize in unseen environments (i.e., environments not used in training). Models using only high-level visual features perform better in unseen environments. However, these models suffer a significant performance drop in seen environments, which means these models can not understand and remember the seen environments thoroughly. According to this observation, we explore several ways to fuse these features. We propose a model to fuse the dialog feature with each modality of visual feature. On the other hand, the prediction of our model is an ensemble of jointly-trained models which focus on different modalities. Our proposed method can be applied in any VLN models and NDH models. Our results show that our method can improve the performance of NDH models in both seen environments and unseen environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.