Learning a Set of Interrelated Tasks by Using a Succession of Motor Policies for a Socially Guided Intrinsically Motivated Learner.

Nicolas Duminy,Sao Mai Nguyen,Dominique Duhaut

doi:10.3389/fnbot.2018.00087

Abstract

We aim at a robot capable to learn sequences of actions to achieve a field of complex tasks. In this paper, we are considering the learning of a set of interrelated complex tasks hierarchically organized. To learn this high-dimensional mapping between a continuous high-dimensional space of tasks and an infinite dimensional space of unbounded sequences of actions, we introduce a new framework called “procedures”, which enables the autonomous discovery of how to combine previously learned skills in order to learn increasingly complex combinations of motor policies. We propose an active learning algorithmic architecture, capable of organizing its learning process in order to achieve a field of complex tasks by learning sequences of primitive motor policies. Based on heuristics of active imitation learning, goal-babbling and strategic learning using intrinsic motivation, our algorithmic architecture leverages our procedures framework to actively decide during its learning process which outcome to focus on and which exploration strategy to apply. We show on a simulated environment that our new architecture is capable of tackling the learning of complex motor policies by adapting the complexity of its policies to the task at hand. We also show that our “procedures” enable the learning agent to discover the task hierarchy and exploit his experience of previously learned skills to learn new complex tasks.

Highlights

Efforts in the robotic industry and academic field have been made for integrating robots in previously human only environments
We examine methods for robots to learn motor policy sequences, methods for multi-task learning, as well as heuristics for learning in high-dimensional mappings such as active learning based on intrinsic motivation, social guidance and strategic learning
The algorithms capable of performing procedures (IM-PB and Socially Guided Intrinsic Motivation with Procedure Babbling (SGIM-PB)) have errors that drop to levels lower than the their non-procedure equivalents (SAGGRIAC and SGIM-ACTS)

Summary

Introduction

Efforts in the robotic industry and academic field have been made for integrating robots in previously human only environments. In such a context, the ability for service robots to continuously learn new tasks, autonomously or guided by their human counterparts, has become necessary. The ability for service robots to continuously learn new tasks, autonomously or guided by their human counterparts, has become necessary They would be needed to carry out multiple tasks, especially in open environments, which is still an ongoing challenge in robotic learning. The range of tasks those robots need to learn can be wide and even change after the deployment of the robot These tasks can require the execution of complex policies, such as sequences of primitive policies. Learning to associate a potentially unbounded sequence of policies to a set of infinite tasks is a challenging problem for multi-task learning, because of the high-dimensionality of the policy and state spaces, of multi-task learning, and of the unbounded, continuous and loosely specified environments

Objectives

Results

Conclusion