Abstract
Target-driven visual navigation is essential for many applications in robotics, and it has gained increasing interest in recent years. In this work, inspired by animal cognitive mechanisms, we propose a novel navigation architecture that simultaneously learns exploration policy and encodes environmental structure. First, to learn exploration policy directly from raw visual input, we use deep reinforcement learning as the basic framework and allow agents to create rewards for themselves as learning signals. In our approach, the reward for the current observation is driven by curiosity and calculated by a count-based approach and temporal distance. While agents learn exploration policy, we use temporal distance to find waypoints in observation sequences and incrementally describe the structure of the environment in a way that integrates episodic memory. Finally, space topological cognition is integrated into the model as a path planning module and combined with a locomotion network to obtain a more generalized approach to navigation. We test our approach in the DMlab, a visually rich 3D environment, and validate its exploration efficiency and navigation performance through extensive experiments. The experimental results show that our approach can explore and encode the environment more efficiently and has better capability in dealing with stochastic objects. In navigation tasks, agents can use space topological cognition to effectively reach the target and guide detour behaviour when a path is unavailable, exhibiting good environmental adaptability.
Highlights
To weigh the effects between them, we test the effects of different parameter sets that are set α + β ≡ 1 and sampled within the same interval (0.1) and show two main results: the episode reward (novelty rewards achieved by the agent within 1800 time steps) and the number of interactions required to encode the environment
The Deep Recurrent Q Network (DRQN) model is equipped with an long short-term memory (LSTM) and compensates for the memory deficit of the DQN, which remembers the target location and returns as many times as possible in an episode, but it requires a large number of time steps to find the target for the first time
We proposed a novel navigation architecture consisting of intrinsic motivation exploration and space topological cognition
Summary
To weigh the effects between them, we test the effects of different parameter sets that are set α + β ≡ 1 and sampled within the same interval (0.1) and show two main results: the episode reward (novelty rewards achieved by the agent within 1800 time steps) and the number of interactions required to encode the environment.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.