Abstract

Reinforcement learning (RL) and imitation learning (IL), especially equipped with deep neural networks, have been widely studied for autonomous robotic skill acquisition and control tasks. However, these methods and their extensions require extensive environmental interactions during training, which greatly prevents them from being applied to real-world robots. To alleviate this problem, we present an efficient model-free off-policy actor-critic algorithm for robotic skill acquisition and continuous control, by fusing the task reward with a task-oriented guiding reward, which is formulated by leveraging few and imperfect expert demonstrations. In this framework, the agent can explore the environment more intentionally, thus sampling efficiency can be achieved; moreover, the agent can also exploit the experience more effectively, thereby substantially improved performance can be realized simultaneously. The empirical results on robotic locomotion tasks show that the proposed scheme can lower sample complexity by 2-10 times in contrast with the state-of-the-art baseline deep RL (DRL) algorithms, while achieving performance better than that of the expert. Furthermore, the proposed algorithm achieves significant improvement in both sampling efficiency and asymptotic performance on tasks with sparse and delayed reward, wherein those baseline DRL algorithms struggle to make progress. This takes a substantial step forward to implement these methods to acquire skills autonomously for real robots.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call