Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients.

Paulo Rauber,Filipe Mutz,Jürgen Schmidhuber,Avinash Ummadisingu

doi:10.1162/neco_a_01387

Abstract

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients.

Abstract

Talk to us

Similar Papers

More From: Neural Computation

Lead the way for us

Journal: Neural Computation	Publication Date: May 13, 2021
Citations: 5

Similar Papers

Q-PrOP: Sample-efficient policy gradient with an off-policy critic
...
-
, et. al. ...
28 Feb 2017
28 Feb 2017

Towards Generalization and Efficiency in Reinforcement Learning

-

02 Jul 2019
02 Jul 2019

Learning models in interdependence situations
...
-
, et. al. ...
18 Nov 2015
18 Nov 2015

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning
Shilei Li ... Jiongming Su
ACM Transactions on Intelligent Systems and Technology | VOL. 12
Shilei Li, et. al.Shilei Li ... Jiongming Su
03 Jun 2021
ACM Transactions on Intelligent Systems and Technology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients.

Abstract

Talk to us

Similar Papers

More From: Neural Computation