Abstract
In open-ended continuous environments, robots need to learn multiple parameterised control tasks in hierarchical reinforcement learning. We hypothesise that the most complex tasks can be learned more easily by transferring knowledge from simpler tasks, and faster by adapting the complexity of the actions to the task. We propose a task-oriented representation of complex actions, called procedures, to learn online task relationships and unbounded sequences of action primitives to control the different observables of the environment. Combining both goal-babbling with imitation learning, and active learning with transfer of knowledge based on intrinsic motivation, our algorithm self-organises its learning process. It chooses at any given time a task to focus on; and what, how, when and from whom to transfer knowledge. We show with a simulation and a real industrial robot arm, in cross-task and cross-learner transfer settings, that task composition is key to tackle highly complex tasks. Task decomposition is also efficiently transferred across different embodied learners and by active imitation, where the robot requests just a small amount of demonstrations and the adequate type of information. The robot learns and exploits task dependencies so as to learn tasks of every complexity.
Highlights
Let us consider a reinforcement learning (RL) [1] robot placed in an environment surrounded by objects, without external rewards, but with human experts’ help
We examine how task decomposition can be learned and transferred from a teacher or another learner using the mechanisms of intrinsic motivation in autonomous exploration and active imitation learning for discovering task hierarchy for cross-task and cross-learner transfer learning
While in [35] we showed on a toy simulation faster learning and better precision in the control, in this article, we show on an industrial robot that task decomposition is pivotal to completing tasks of higher complexity, and we test the properties of our active imitation of task decomposition : it is valid for cross-learner transfer even in the case of different embodiments, and active imitation proves more efficient than imitation of a batch dataset given from initialisation
Summary
Let us consider a reinforcement learning (RL) [1] robot placed in an environment surrounded by objects, without external rewards, but with human experts’ help. In the case of tasks with various complexities and dimensionalities, without a priori domain knowledge, the complexities of actions considered should be unbounded. If an action primitive of dimension n is sufficient for placing an object to a position, a sequence of 2 primitives, i.e., an action of dimension 2n is sufficient to place the stick on 2 xylophone keys. We consider that actions of unbounded complexity can be expressed as action primitives and unbounded sequences of action primitives, named in [2] respectively micro and compound actions. The agent needs to estimate the complexity of the task and deploy actions of the corresponding complexity
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.