Abstract

The goal of imitation learning (IL) is to enable the robot to imitate expert behavior given expert demonstrations. Adversarial imitation learning (AIL) is a recent successful IL architecture that has shown significant progress in complex continuous tasks, particularly robotic tasks. However, in most cases, the acquisition of high-quality demonstrations is costly and laborious, which poses a significant challenge for AILs. Although generative adversarial imitation learning (GAIL) and its extensions have shown that they are robust to sub-optimal experts, it is difficult for them to surpass the performance of experts by a large margin. To address this issue, in this paper, we propose a novel off-policy AIL method called robust adversarial imitation learning (RAIL). To enable the agent to significantly outperform a sub-optimal expert providing demonstrations, the hindsight idea of variable reward (VR) is first incorporated into the off-policy AIL framework. Then, a strategy called hindsight copy (HC) of demonstrations is designed to provide the discriminator and trained policy in the AIL framework with different demonstrations to maximize the use of such demonstrations and speed up the learning. Experiments were conducted on two multi-goal robotic tasks to test the proposed method. The results show that our method is not limited to the quality of expert demonstrations and can outperform other IL approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call