Optimal bipartite graph matching-based goal selection for policy-based hindsight learning

Shiguang Sun,Xuguang Lan,Zeyang Liu,Hanbo Zhang,Xingyu Chen

doi:10.1016/j.neucom.2024.127734

Abstract

The sparse reward problem stands as a significant challenge in the field of reinforcement learning. Hindsight Experience Replay (HER) addresses this by goal relabeling, allowing the agent to learn from unsuccessful experiences. Some studies combine policy gradient methods with HER, resulting in policy-based hindsight learning algorithms. However, Policy-based hindsight learning involves the use of importance sampling, where the distribution of hindsight goals and the distribution of desired goals contribute to the computation of importance weights. When there is a significant difference between the two distributions, importance weights may become skewed, thereby impacting the evaluation of the policy and leading to suboptimal policies. To address this, we propose modeling the goal selection as an optimization problem for distribution matching. After we augment the original desired goals using Kernel Density Estimation (KDE), we further convert the optimization problem for distribution matching into a bipartite graph matching problem that minimizes the sum of weights. Our optimal bipartite graph matching-based hindsight goal selection method can select hindsight goals that are the most closely aligned with the original goals. Experimental results show that algorithms combined with the optimal bipartite graph matching-based hindsight goal selection outperform the original algorithms. Visualizations also demonstrate the superiority of our method in selecting hindsight goals.

Full Text