Abstract

Learning to achieve a user-specified objective from a random position in unseen environments is challenging for image-guided navigation agents. The abilities of long-horizon reasoning and semantic understanding are still lacking. Inspired by the human memory mechanism, we introduce a neural multi-store memory network to the reinforcement learning framework for target-driven visual navigation. The proposed memory network utilizes three temporal stages of memory to build time dependency for better scene understanding. Sensory memory encodes observations and embeds transient information into working memory, which is short-term and realized by a gated recurrent neural network (RNN). Then, the long-term memory stores the latent state from each step of the RNN into a single slot. Finally, a self-attention reading mechanism is designed to retrieve goal-related information from long-term memory. In addition, to improve the scene generalization capability of the agent, we facilitate training of the visual representation with a self-supervised auxiliary task and image augmentation. This method can navigate agents in unknown visual-realistic environments using only egocentric observations, without the need for any position sensors or pretrained models. The evaluation results on the Matterport3D dataset through the Habitat simulator demonstrate that our method outperforms the state-of-the-art approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.