Abstract

Vanilla policy gradient methods suffer from high variance, leading to unstable policies during training, where the policy’s performance fluctuates drastically between iterations. To address this issue, we analyze the policy optimization process of the navigation method based on deep reinforcement learning (DRL) that uses asynchronous gradient descent for optimization. A variant navigation (asynchronous proximal policy optimization navigation, appoNav) is presented that can guarantee the policy monotonic improvement during the process of policy optimization. Our experiments are tested in DeepMind Lab, and the experimental results show that the artificial agents with appoNav perform better than the compared algorithm.

Highlights

  • Navigation in an unstructured environment is one of the most important abilities for mobile robotics and artificial agents [1,2,3]

  • Traditional methods mainly divide navigation into several parts [4]: simultaneous localization and mapping (SLAM) [5,6,7], path planning [8], and semantic segmentation [9, 10]. e methods mentioned are not an end-to-end algorithm where each part is a challenging research subject, and the fusion of each part often leads to large computational errors

  • To reduce the fusion error, we focus on the end-to-end navigation based on deep reinforcement learning where navigational abilities could emerge as the byproduct of an artificial agent learning policy with reward maximization

Read more

Summary

Introduction

Navigation in an unstructured environment is one of the most important abilities for mobile robotics and artificial agents [1,2,3]. To reduce the fusion error, we focus on the end-to-end navigation based on deep reinforcement learning where navigational abilities could emerge as the byproduct of an artificial agent learning policy with reward maximization. DeepMind Lab can be used to study how autonomous artificial agents learn complex tasks in large, partially observed, and visually diverse worlds. Mirowski et al [21] proposed a DRL navigation method based on A3C [18], augmented with auxiliary learning targets, to train artificial agents to navigate in DeepMind Lab. For ease of expression, we call the DRL navigation using A3C as a3cNav. In this paper, the issues on policy optimization for navigation based on the vanilla policy gradient are analyzed; this type of navigation cannot control the change of expected advantage when an artificial agent learns to navigate in a maze. Experimental results show that an artificial agent via appoNav learns better navigation policy in DeepMind Lab and suffers from lower standard deviation than a3cNav

Related Work
Background
D1 rt–1
Approach
Experiments
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.