Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning

Li Jiang,Xiaoyun Feng,Haoran Xu,Xiaoyan Sun,Xianyuan Zhan,Jie Wang,Wai Kin Victor Chan,Xudong Yu

doi:10.1109/tg.2022.3224088

Abstract

Offline reinforcement learning (RL) enables learning policies from pre-collected datasets without online data collection. Although it offers the possibility to surpass the performance of the datasets, most existing offline RL algorithms struggle to compete with behavior cloning policies in many dataset settings due to trading off policy improvement and additional regularization to address the distributional shift issue. In many cases, if one can imitate a sequence of sub-optimal sub-trajectories in data and properly <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">“stitch”</i> them toward reaching an ideal future state, it may potentially result in a more reliable policy while avoiding the difficulties that present in typical value-based offline RL algorithms. We borrow the idea of curriculum learning to embody the above intuition. We construct a curriculum that progressively imitates a sequence of sub-optimal trajectories conditioned on a series of carefully constructed future states and cumulative rewards as goals. The sub-optimal trajectories gradually guide policy learning toward reaching the ideal goal states. We name our algorithm Curriculum Goal-conditioned Imitation (CGI). Experimental results show that CGI achieves competitive performance against state-of-the-art offline RL algorithms, especially for challenging tasks with long horizons and sparse rewards.

Full Text