Abstract

Reinforcement learning is a useful tool for training an agent to effectively achieve the desired goal in the sequential decision-making problem. It trains the agent to make decision by exploiting the experience in the transitions resulting from the different decisions. To exploit this experience, most reinforcement learning methods replay the explored transitions by uniform sampling. But in this way, it is easy to ignore the last explored transitions. Another way to exploit this experience defines the priority of each transition by the estimation error in training and then replays the transitions according to their priorities. But it only updates the priorities of the transitions replayed at the current training time step, thus the transitions with low priorities will be ignored. In this paper, we propose a clustering experience replay, called CER, to effectively exploit the experience hidden in all explored transitions in the current training. CER clusters and replays the transitions by a divide-and-conquer framework based on time division as follows. Firstly, it divides the whole training process into several periods. Secondly, at the end of each period, it uses k-means to cluster the transitions explored in this period. Finally, it constructs a conditional probability density function to ensure that all kinds of transitions will be sufficiently replayed in the current training. We construct a new method, TD3_CER, to implement our clustering experience replay on TD3. Through the theoretical analysis and experiments, we illustrate that our TD3_CER is more effective than the existing reinforcement learning methods. The source code can be downloaded from https://github.com/grcai/CER-Master.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.