Abstract

Sparse rewards is a tricky problem in reinforcement learning and reward shaping is commonly used to solve the problem of sparse rewards in specific tasks, but it often requires priori knowledge and manually designing rewards, which are costly in many cases. Hindsight experience replay (HER) solves the problem of sparse rewards in multi-goal scenarios by replacing the goal of a failed trajectory with a virtual goal. Our method integrates the ideas of reward shaping and HER, which has two advantages: First, it can automatically perform reward shaping without manually-designed reward functions; Second, it can solve the problem arising from the use of virtual goals in HER. Experiment results show our method can significantly improve the performance in both Bit-Flipping environment and Mujoco environment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call