Abstract

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Highlights

  • Reinforcement learning (RL)[1] has shown impressive results in numerous simulated tasks ranging from attaining superhuman performance in video games[2,3] and board games[4] to learning complex motion behaviors.[5,6] sparse reward has always been a challenging problem for RL on robotic tasks

  • Considering an agent interacting with an environment and assuming that the environment is fully observable, a Markov decision process is defined as a tuple ðS; A; p; r; gÞ, where S is a set of states, A is a set of actions, pðstþ1jst; atÞ are transition probabilities, r : S  A ! R is a reward function, and g 2 1⁄20; 1Š is a discount factor

  • The results show that both in the push and pick-and-place tasks, Deep deterministic policy gradient (DDPG) without the reward signal cannot learn effectively, which further confirms that traditional RL is heavily dependent on the reward function to success rate success rate epochs(every epoch = 100 episodes = 100 * 50 steps) epochs(every epoch = 100 episodes = 100 * 50 steps)

Read more

Summary

Introduction

Reinforcement learning (RL)[1] has shown impressive results in numerous simulated tasks ranging from attaining superhuman performance in video games[2,3] and board games[4] to learning complex motion behaviors.[5,6] sparse reward has always been a challenging problem for RL on robotic tasks. An improved HER method[9] combining curiosity with priority mechanism is proposed to improve both the performance and sample efficiency of HER This method inherently believes that both the real and hindsight experiences have the same effects and arbitrarily puts more focus on the underrepresented achieved states. Another improvement of HER is ARCHER,[10] which compensates for the bias in HER by giving more rewards to hindsight experiences, but this method may bring harm to the final performance of the algorithm for some complex tasks. Other methods[8,31] combining RL with IL use demonstrations to accelerate the exploration of agents in the environment with sparse rewards, but they are too dependent on the quality of demonstrations, and the final performance is not satisfactory.

Background
Method
Initialize density model GMM
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call