Abstract

When applying deep reinforcement learning in the real physical environment for decision-making, how to improve the sample efficiency while ensuring training stability is an urgent problem that needs to be solved. In order to solve this problem, some of on-policy algorithms are proposed and have achieved state-of-the-art performance. However, these on-policy algorithms, such as proximal policy optimization (PPO) algorithm, have the drawback of extremely low sample efficiency. In this study, we proposed a novel policy optimization method named improved proximal policy optimization algorithm based on sample adaptive reuse and dual-clipping (SARD-PPO) for robotic action control, which combines the advantage of the on-policy methods in training stability with the advantage of the off-policy methods in sample efficiency. First, we analyzed the clipping mechanism of the PPO algorithm, devised a more constrained clipping mechanism based on the analysis of the relationship between the clipping mechanism and the objective constraints, and developed a policy updating method that reuses the old samples of the prior policy in a more principle-based way. Second, we ensured the training stability of the algorithm through element-level dual-clipping, as well as adaptive adjustment and reuse of the entire policy trajectory. The experimental results on six tasks in the MuJoCo benchmark indicate that SARD-PPO can significantly improve policy performance while balancing policy training stability and sample efficiency, outperforming the baseline PPO algorithm and other SOTA policy gradient methods using on- and off-policy samples in terms of overall performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.