Effective Policy Adjustment via Meta-Learning for Complex Manipulation Tasks

Binghong Wu,Xin Cai,Tong Wang,Xuesong Tang,Kuangrong Hao

doi:10.1109/cac.2018.8623652

Abstract

The ability of adjusting policy is the key to learning decision making when completing complex manipulation tasks for agents. To solve this problem with the consideration of both exploration and exploitation, we propose a novel deep reinforcement learning algorithm by combining the Hindsight Experience Replay (HER) with the Model-Agnostic Meta-Learning (MAML). To solve the complex manipulation tasks, HER could provide a relatively effective exploration by converting the single-goal task to the multiple goals in such an environment where rewards are sparse and binary, enhancing the ability to search better policies according to not only the successful the transition trajectories but also the failures, and the MAML could promote the ability of exploitation, which means the proposed algorithm could learn faster and adjust the policy model from limited experience within few iterations. Plenty of simulation results on the complex tasks of manipulating objects with a robotic arm have been done, and results show that HER integrated with MAML could accelerate fine-tuning for the original policy gradient reinforcement learning with neural network policy, and also improve the performance on the success rate.

Full Text