Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

Guoyu Zuo,Jiahao Lu,Qishen Zhao,Jiangeng Li

doi:10.1177/1729881419898342

Abstract

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Highlights

Reinforcement learning (RL)[1] has shown impressive results in numerous simulated tasks ranging from attaining superhuman performance in video games[2,3] and board games[4] to learning complex motion behaviors.[5,6] sparse reward has always been a challenging problem for RL on robotic tasks
Considering an agent interacting with an environment and assuming that the environment is fully observable, a Markov decision process is defined as a tuple ðS; A; p; r; gÞ, where S is a set of states, A is a set of actions, pðstþ1jst; atÞ are transition probabilities, r : S Â A ! R is a reward function, and g 2 1⁄20; 1 is a discount factor
The results show that both in the push and pick-and-place tasks, Deep deterministic policy gradient (DDPG) without the reward signal cannot learn effectively, which further confirms that traditional RL is heavily dependent on the reward function to success rate success rate epochs(every epoch = 100 episodes = 100 * 50 steps) epochs(every epoch = 100 episodes = 100 * 50 steps)

Summary

Introduction

Reinforcement learning (RL)[1] has shown impressive results in numerous simulated tasks ranging from attaining superhuman performance in video games[2,3] and board games[4] to learning complex motion behaviors.[5,6] sparse reward has always been a challenging problem for RL on robotic tasks. An improved HER method[9] combining curiosity with priority mechanism is proposed to improve both the performance and sample efficiency of HER This method inherently believes that both the real and hindsight experiences have the same effects and arbitrarily puts more focus on the underrepresented achieved states. Another improvement of HER is ARCHER,[10] which compensates for the bias in HER by giving more rewards to hindsight experiences, but this method may bring harm to the final performance of the algorithm for some complex tasks. Other methods[8,31] combining RL with IL use demonstrations to accelerate the exploration of agents in the environment with sparse rewards, but they are too dependent on the quality of demonstrations, and the final performance is not satisfactory.

Background

Method

Initialize density model GMM

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Robotic Systems	Publication Date: Jan 1, 2020
Citations: 13	License type: cc-by

R Discovery Prime

R Discovery Prime

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Robotic Systems

Lead the way for us

Similar Papers

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards
Junkai Ren ... Yixing Lan
-
Junkai Ren, et. al.Junkai Ren ... Yixing Lan
15 Jul 2021
15 Jul 2021

Deep Reinforcement Learning for the Improvement of Robot Manipulation Skills Under Sparse Reward
Maochang He ... Hiroshi Yokoi
-
Maochang He, et. al.Maochang He ... Hiroshi Yokoi
15 Aug 2022
15 Aug 2022

Overcoming Exploration in Reinforcement Learning with Demonstrations
Ashvin Nair ... Marcin Andrychowicz
-
Ashvin Nair, et. al.Ashvin Nair ... Marcin Andrychowicz
01 May 2018
01 May 2018

Path planning of robotic arm based on deep reinforcement learning algorithm
Mostafa Al‐Gabalawy
Advanced Control for Applications | VOL. 4
Mostafa Al‐GabalawyMostafa Al‐Gabalawy
01 Mar 2022
Advanced Control for Applications | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Robotic Systems