Hindsight Experience Replay Research Articles

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.

Read full abstract

In reinforcement learning (RL), a reinforcement signal may be infrequent and delayed, not appearing immediately after the action that triggered the reward. To trace back what sequence of actions contributes to delayed rewards, e.g., credit assignment (CA), is one of the biggest challenges in RL. This challenge is aggravated under sparse binary rewards, especially when rewards are given only after successful completion of the task. To this end, a novel method consisting of key-action detection, among a sequence of actions to perform a task under sparse binary rewards, and CA strategy is proposed. The key-action defined as the most important action contributing to the reward is detected by a deep neural network that predicts future rewards based on the environment information. The rewards are re-assigned to the key-action and its adjacent actions, defined as adjacent-key-actions. Such re-assignment process enables increased success rate and convergence speed during training. For efficient re-assignment, two CA strategies are considered as part of proposed method. Proposed method is combined with hindsight experience replay (HER) for experiments in the OpenAI gym suite robotics environment. In the experiments, it is demonstrated that proposed method can detect key-actions and outperform the HER, increasing success rate and convergence speed, in the Fetch slide task, a type of task that is more exacting as compared to other tasks, but is addressed by few publications in the literature. From the experiments, a guideline for selecting CA strategy according to goal location is provided through goal distribution analysis with dot map.

Read full abstract

Hindsight Experience Replay Research Articles

Related Topics

Articles published on Hindsight Experience Replay

From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving.

Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning

Deep Reinforcement Learning for the Navigation of Neurovascular Catheters

Goal-Oriented Dialogue Policy Learning from Failures

Guided goal generation for hindsight multi-goal reinforcement learning

Rewards Prediction-Based Credit Assignment for Reinforcement Learning With Sparse Binary Rewards

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hindsight Experience Replay Research Articles

Related Topics

Articles published on Hindsight Experience Replay

From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving.

Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning

Deep Reinforcement Learning for the Navigation of Neurovascular Catheters

Goal-Oriented Dialogue Policy Learning from Failures

Guided goal generation for hindsight multi-goal reinforcement learning

Rewards Prediction-Based Credit Assignment for Reinforcement Learning With Sparse Binary Rewards