Clustering experience replay for the effective exploitation in reinforcement learning

Min Li,Tianyi Huang,William Zhu

doi:10.1016/j.patcog.2022.108875

Abstract

Reinforcement learning is a useful tool for training an agent to effectively achieve the desired goal in the sequential decision-making problem. It trains the agent to make decision by exploiting the experience in the transitions resulting from the different decisions. To exploit this experience, most reinforcement learning methods replay the explored transitions by uniform sampling. But in this way, it is easy to ignore the last explored transitions. Another way to exploit this experience defines the priority of each transition by the estimation error in training and then replays the transitions according to their priorities. But it only updates the priorities of the transitions replayed at the current training time step, thus the transitions with low priorities will be ignored. In this paper, we propose a clustering experience replay, called CER, to effectively exploit the experience hidden in all explored transitions in the current training. CER clusters and replays the transitions by a divide-and-conquer framework based on time division as follows. Firstly, it divides the whole training process into several periods. Secondly, at the end of each period, it uses k-means to cluster the transitions explored in this period. Finally, it constructs a conditional probability density function to ensure that all kinds of transitions will be sufficiently replayed in the current training. We construct a new method, TD3_CER, to implement our clustering experience replay on TD3. Through the theoretical analysis and experiments, we illustrate that our TD3_CER is more effective than the existing reinforcement learning methods. The source code can be downloaded from https://github.com/grcai/CER-Master.

Full Text