Abstract

In this article, we study a new reinforcement learning (RL) setting where the environment is nonrewarding, contains several possibly related objects of various controllability, where an apt agent Bob acts following its own goals, without necessarily providing helpful demonstrations, and where the objective of an agent is to learn to control objects individually. We present a generic discrete-state discrete-action model of such environments, and an unsupervised RL agent called CLIC for curriculum learning and imitation for control to achieve the desired objective. CLIC selects objects to focus on when training and imitating by maximizing its learning progress. We show that CLIC can effectively observe Bob to gain control of objects faster, even if Bob is not explicitly teaching. Despite choosing what it imitates in a principled way, CLIC retains the natural ability to follow Bob when he provides ordered demonstrations. Finally, we show that compared with a noncurriculum-based agent, when Bob controls objects that the agent cannot, or in presence of a hierarchy between objects in the environment, CLIC achieves faster mastery of the environment by ignoring nonreproducible and already mastered interactions with objects when imitating.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.