Abstract

In reinforcement learning (RL), a reinforcement signal may be infrequent and delayed, not appearing immediately after the action that triggered the reward. To trace back what sequence of actions contributes to delayed rewards, e.g., credit assignment (CA), is one of the biggest challenges in RL. This challenge is aggravated under sparse binary rewards, especially when rewards are given only after successful completion of the task. To this end, a novel method consisting of key-action detection, among a sequence of actions to perform a task under sparse binary rewards, and CA strategy is proposed. The key-action defined as the most important action contributing to the reward is detected by a deep neural network that predicts future rewards based on the environment information. The rewards are re-assigned to the key-action and its adjacent actions, defined as adjacent-key-actions. Such re-assignment process enables increased success rate and convergence speed during training. For efficient re-assignment, two CA strategies are considered as part of proposed method. Proposed method is combined with hindsight experience replay (HER) for experiments in the OpenAI gym suite robotics environment. In the experiments, it is demonstrated that proposed method can detect key-actions and outperform the HER, increasing success rate and convergence speed, in the Fetch slide task, a type of task that is more exacting as compared to other tasks, but is addressed by few publications in the literature. From the experiments, a guideline for selecting CA strategy according to goal location is provided through goal distribution analysis with dot map.

Highlights

  • Reinforcement learning (RL) refers to a machine learning method that allows an agent to learn actions to achieve goals with minimum supervision by providing reinforcement signal in the form of negative or positive reward

  • The Fetch push task and Fetch slide task with goals in near zone, where the goal is achieved more effectively by push action, can be classified into EDAR tasks, whereas the Fetch slide task with goals positioned in far zone, requiring hit action to achieve goals, can be classified into non-EDAR tasks

  • The Fetch slide task with goals positioned in far zone is called in this paper far-zone Fetch slide task

Read more

Summary

INTRODUCTION

Reinforcement learning (RL) refers to a machine learning method that allows an agent to learn actions to achieve goals with minimum supervision by providing reinforcement signal in the form of negative or positive reward. Many off-policy algorithms being used extensively, such as deep Q-networks [1], are methods involving CA with neural networks They allow an agent to learn a policy even if there exists a delay between an action and the corresponding reward by propagating the reward. Effective training is enabled through CA by assigning the delayed reward directly to the action that contributes to the achievement of the goal. The hindsight experience replay (HER) algorithm [20] effectively explores the environment by replacing the original goal of an episode with the actual result of the episode This increases the frequency of the reward for success under sparse binary rewards. A novel method comprising key-action detection and CA strategies is proposed, in order to increase success rate and convergence speed during training in high-dimensional RL environment under sparse binary rewards.

OFF-POLICY REINFORCEMENT LEARNING
PROPOSED ALGORITHM COMBINED WITH DDPG AND HER
21: Calculate the key-action for episode e with K
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.