Mixed experience sampling for off-policy reinforcement learning

Jiayu Yu,Jingyao Li,Shuai Lü,Shuai Han

doi:10.1016/j.eswa.2024.124017

Abstract

In deep reinforcement learning, experience replay is usually used to improve data efficiency and alleviate experience forgetting. However, online reinforcement learning is often influenced by the index of experience, which usually makes the phenomenon of unbalanced sampling. In addition, most experience replay methods ignore the differences among experiences, and cannot make full use of all experiences. Especially many “near”-policy experiences relatively relevant to the current policy are wasted, despite of the fact that they are beneficial for improving sample efficiency. This paper theoretically analyzes the influence of various factors on experience sampling, and then proposes a sampling method for experience replay based on frequency and similarity (FSER) to alleviate unbalanced sampling and increase the value of the sampled experiences. FSER prefers experiences that are rarely sampled or highly relevant to the current policy. FSER plays a critical role to balance the experience forgetting and wasting problems. Finally, FSER is combined with TD3 to achieve the state-of-the-art results in multiple tasks.

Full Text