Hindsight Balanced Reward Shaping

Mengxuan Shao,Kun Han,Debin Zhao,Feng Jiang,Shaohui Liu

doi:10.1007/978-981-99-1642-9_42

Abstract

Sparse rewards is a tricky problem in reinforcement learning and reward shaping is commonly used to solve the problem of sparse rewards in specific tasks, but it often requires priori knowledge and manually designing rewards, which are costly in many cases. Hindsight experience replay (HER) solves the problem of sparse rewards in multi-goal scenarios by replacing the goal of a failed trajectory with a virtual goal. Our method integrates the ideas of reward shaping and HER, which has two advantages: First, it can automatically perform reward shaping without manually-designed reward functions; Second, it can solve the problem arising from the use of virtual goals in HER. Experiment results show our method can significantly improve the performance in both Bit-Flipping environment and Mujoco environment.

Full Text