Abstract

Prioritized experience replay (PER) chooses the experience data based on the value of Temporal-Difference (TD) error, it can improve the utilization of experience in deep reinforcement learning based methods. But since the value of TD error needs to be calculated when experience data sampled at each time, so the computational complexity of PER is high. And the hyperparameters in PER also affect the learning process. The hyperparameters need to be adjusted carefully. To cope above problems, we propose meta-learning-based experience replay buffer separation (MSER) in this paper. Firstly, the original experience replay buffer is divided into a successful experience buffer and a failure experience buffer, and the experience data of successful exploration or failure exploration are stored respectively. Secondly, the experience data is sampled randomly from two experience replay buffers according to a ratio, and the ratio is learned through the neural network designed by us. Finally, we conduct experiments in a simulation environment for trajectory planning of robot manipulator in V-REP. Extensive experiments show that the MSER proposed is capable of improving convergence rate by up to 9.1% in DDPG, compared to DDPG with PER. The convergence mean reward rises by up to 3.3% and the standard deviation reduce by 16.9%. In the evaluation, the success rate of trajectory planning for robot manipulator is up to 99%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.