Abstract

Target-driven visual navigation is essential for many applications in robotics, and it has gained increasing interest in recent years. In this work, inspired by animal cognitive mechanisms, we propose a novel navigation architecture that simultaneously learns exploration policy and encodes environmental structure. First, to learn exploration policy directly from raw visual input, we use deep reinforcement learning as the basic framework and allow agents to create rewards for themselves as learning signals. In our approach, the reward for the current observation is driven by curiosity and calculated by a count-based approach and temporal distance. While agents learn exploration policy, we use temporal distance to find waypoints in observation sequences and incrementally describe the structure of the environment in a way that integrates episodic memory. Finally, space topological cognition is integrated into the model as a path planning module and combined with a locomotion network to obtain a more generalized approach to navigation. We test our approach in the DMlab, a visually rich 3D environment, and validate its exploration efficiency and navigation performance through extensive experiments. The experimental results show that our approach can explore and encode the environment more efficiently and has better capability in dealing with stochastic objects. In navigation tasks, agents can use space topological cognition to effectively reach the target and guide detour behaviour when a path is unavailable, exhibiting good environmental adaptability.

Highlights

  • To weigh the effects between them, we test the effects of different parameter sets that are set α + β ≡ 1 and sampled within the same interval (0.1) and show two main results: the episode reward (novelty rewards achieved by the agent within 1800 time steps) and the number of interactions required to encode the environment

  • The Deep Recurrent Q Network (DRQN) model is equipped with an long short-term memory (LSTM) and compensates for the memory deficit of the DQN, which remembers the target location and returns as many times as possible in an episode, but it requires a large number of time steps to find the target for the first time

  • We proposed a novel navigation architecture consisting of intrinsic motivation exploration and space topological cognition

Read more

Summary

Introduction

To weigh the effects between them, we test the effects of different parameter sets that are set α + β ≡ 1 and sampled within the same interval (0.1) and show two main results: the episode reward (novelty rewards achieved by the agent within 1800 time steps) and the number of interactions required to encode the environment.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call