Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

Tung M Luu,Chang D Yoo

doi:10.1109/access.2021.3069975

Tung M Luu, Chang D Yoo

Open Access

https://doi.org/10.1109/access.2021.3069975

Copy DOI

Abstract

This paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR) in overcoming the limitation of Hindsight Experience Replay (HER) that generates hindsight goals based on uniform sampling. HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error, which is considered as a proxy measure of the amount which the RL agent can learn from an experience. The actual sampling for large TD error is performed in two steps: first, an episode is sampled from the relay buffer according to the average TD error of its experiences, and then, for the sampled episode, the hindsight goal leading to larger TD error is sampled with higher probability from future visited states. The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that without any prioritization on four challenging simulated robotic manipulation tasks. The empirical results show that HGR uses samples more efficiently than previous methods across all tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

The second order temporal difference error for Sarsa(λ)
Qiming Fu ... Quan Liu
-
Qiming Fu, et. al.Qiming Fu ... Quan Liu
01 Apr 2013
01 Apr 2013

A spiking network model of basal ganglia to study the effect of dopamine medication and STN-DBS during probabilistic learning task
Alekhya Mandali ... V Srinivasa Chakravarthy
BMC Neuroscience | VOL. 16
Alekhya Mandali, et. al.Alekhya Mandali ... V Srinivasa Chakravarthy
01 Dec 2015
BMC Neuroscience | VOL. 16

A Meta-learning Method Based on Temporal Difference Error
Kunikazu Kobayashi ... Hiroyuki Mizoue
-
Kunikazu Kobayashi, et. al.Kunikazu Kobayashi ... Hiroyuki Mizoue
01 Jan 2009
01 Jan 2009

An error-sensitive Q-learning approach for robot navigation
Rongkuan Tang ... Hongliang Yuan
-
Rongkuan Tang, et. al.Rongkuan Tang ... Hongliang Yuan
01 Jul 2015
01 Jul 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

Abstract

Talk to us

Similar Papers

More From: IEEE Access