Solving complex long-horizon tasks through Reinforcement Learning (RL) from scratch presents challenges related to efficient exploration. Two common approaches to reduce complexity and enhance exploration efficiency are (i) integrating learning-from-demonstration techniques with online RL, where the prior knowledge acquired from demonstrations is used to guide exploration, refine representations, or tailor reward functions, and (ii) using representation learning to facilitate state abstraction. In this study, we present Skill-Conditioned Online REinforcement Learning (SCORE), a novel approach that leverages these two strategies and utilizes skills acquired from an unstructured demonstrations dataset in a policy gradient RL algorithm. This integration enriches the algorithm with informative input representations, improving downstream task learning and exploration efficiency. We evaluate our method on long-horizon robotic and navigation tasks and game environments, demonstrating enhancements in online RL performance compared to the baselines. Furthermore, we show our approach’s generalization capabilities and analyze its effectiveness through an ablation study.
Read full abstract