CLIC: Curriculum Learning and Imitation for Object Control in Nonrewarding Environments

Pierre Fournier,Mohamed Chetouani,Cedric Colas,Olivier Sigaud

doi:10.1109/tcds.2019.2933371

Abstract

In this article, we study a new reinforcement learning (RL) setting where the environment is nonrewarding, contains several possibly related objects of various controllability, where an apt agent Bob acts following its own goals, without necessarily providing helpful demonstrations, and where the objective of an agent is to learn to control objects individually. We present a generic discrete-state discrete-action model of such environments, and an unsupervised RL agent called CLIC for curriculum learning and imitation for control to achieve the desired objective. CLIC selects objects to focus on when training and imitating by maximizing its learning progress. We show that CLIC can effectively observe Bob to gain control of objects faster, even if Bob is not explicitly teaching. Despite choosing what it imitates in a principled way, CLIC retains the natural ability to follow Bob when he provides ordered demonstrations. Finally, we show that compared with a noncurriculum-based agent, when Bob controls objects that the agent cannot, or in presence of a hierarchy between objects in the environment, CLIC achieves faster mastery of the environment by ignoring nonreproducible and already mastered interactions with objects when imitating.

Full Text