Abstract

The sparse reward problem stands as a significant challenge in the field of reinforcement learning. Hindsight Experience Replay (HER) addresses this by goal relabeling, allowing the agent to learn from unsuccessful experiences. Some studies combine policy gradient methods with HER, resulting in policy-based hindsight learning algorithms. However, Policy-based hindsight learning involves the use of importance sampling, where the distribution of hindsight goals and the distribution of desired goals contribute to the computation of importance weights. When there is a significant difference between the two distributions, importance weights may become skewed, thereby impacting the evaluation of the policy and leading to suboptimal policies. To address this, we propose modeling the goal selection as an optimization problem for distribution matching. After we augment the original desired goals using Kernel Density Estimation (KDE), we further convert the optimization problem for distribution matching into a bipartite graph matching problem that minimizes the sum of weights. Our optimal bipartite graph matching-based hindsight goal selection method can select hindsight goals that are the most closely aligned with the original goals. Experimental results show that algorithms combined with the optimal bipartite graph matching-based hindsight goal selection outperform the original algorithms. Visualizations also demonstrate the superiority of our method in selecting hindsight goals.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.