Abstract
The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.
Highlights
Reinforcement learning (RL)[1] has shown impressive results in numerous simulated tasks ranging from attaining superhuman performance in video games[2,3] and board games[4] to learning complex motion behaviors.[5,6] sparse reward has always been a challenging problem for RL on robotic tasks
Considering an agent interacting with an environment and assuming that the environment is fully observable, a Markov decision process is defined as a tuple ðS; A; p; r; gÞ, where S is a set of states, A is a set of actions, pðstþ1jst; atÞ are transition probabilities, r : S Â A ! R is a reward function, and g 2 1⁄20; 1 is a discount factor
The results show that both in the push and pick-and-place tasks, Deep deterministic policy gradient (DDPG) without the reward signal cannot learn effectively, which further confirms that traditional RL is heavily dependent on the reward function to success rate success rate epochs(every epoch = 100 episodes = 100 * 50 steps) epochs(every epoch = 100 episodes = 100 * 50 steps)
Summary
Reinforcement learning (RL)[1] has shown impressive results in numerous simulated tasks ranging from attaining superhuman performance in video games[2,3] and board games[4] to learning complex motion behaviors.[5,6] sparse reward has always been a challenging problem for RL on robotic tasks. An improved HER method[9] combining curiosity with priority mechanism is proposed to improve both the performance and sample efficiency of HER This method inherently believes that both the real and hindsight experiences have the same effects and arbitrarily puts more focus on the underrepresented achieved states. Another improvement of HER is ARCHER,[10] which compensates for the bias in HER by giving more rewards to hindsight experiences, but this method may bring harm to the final performance of the algorithm for some complex tasks. Other methods[8,31] combining RL with IL use demonstrations to accelerate the exploration of agents in the environment with sparse rewards, but they are too dependent on the quality of demonstrations, and the final performance is not satisfactory.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.