Abstract

Experience replay has been instrumental in achieving significant advancements in reinforcement learning by increasing the utilization of data. To further improve the sampling efficiency, prioritized experience replay (PER) was proposed. This algorithm prioritizes experiences based on the temporal difference error (TD error), enabling the agent to learn from more valuable experiences stored in the experience pool. While various prioritized algorithms have been proposed, they ignored the dynamic changes of experience value during the training process, merely combining different priority criteria in a fixed or linear manner. In this paper, we present a novel prioritized experience replay algorithm called PERDP, which employs a dynamic priority adjustment framework. PERDP adaptively adjusts the weights of each criterion based on average priority level of the experience pool and evaluates experiences’ value according to current network. We apply this algorithm to the SAC model and conduct experiments in the OpenAI Gym experimental environment. The experiment results demonstrate that the PERDP exhibits superior convergence speed when compared to the PER.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call