Learning modular policies for robotics.

Gerhard Neumann,Christian Daniel,Alexandros Paraschos,Jan Peters,Andras Kupcsik

doi:10.3389/fncom.2014.00062

Abstract

A promising idea for scaling robot learning to more complex tasks is to use elemental behaviors as building blocks to compose more complex behavior. Ideally, such building blocks are used in combination with a learning algorithm that is able to learn to select, adapt, sequence and co-activate the building blocks. While there has been a lot of work on approaches that support one of these requirements, no learning algorithm exists that unifies all these properties in one framework. In this paper we present our work on a unified approach for learning such a modular control architecture. We introduce new policy search algorithms that are based on information-theoretic principles and are able to learn to select, adapt and sequence the building blocks. Furthermore, we developed a new representation for the individual building block that supports co-activation and principled ways for adapting the movement. Finally, we summarize our experiments for learning modular control architectures in simulation and with real robots.

Highlights

Robot learning approaches such as policy search methods (Kober and Peters, 2010; Kormushev et al, 2010; Theodorou et al, 2010) have been very successful. Kormushev et al (2010) Learned to flip pan-cakes and Kober and Peters (2010) Learned the game ballin-the-cup
PROBABILISTIC MOVEMENT PRIMITIVES In the second part of this paper, we investigate new representations for the individual building blocks of movements that are suited to be used in a modular control architecture
As we focused on the representation of the individual building blocks, we evaluated the new representation without the use of reinforcement learning and learned the Probabilistic Movement Primitive (ProMP) by imitation

Summary

INTRODUCTION

Robot learning approaches such as policy search methods (Kober and Peters, 2010; Kormushev et al, 2010; Theodorou et al, 2010) have been very successful. Kormushev et al (2010) Learned to flip pan-cakes and Kober and Peters (2010) Learned the game ballin-the-cup. Using a probabilistic model fitting approach to compute the policy update results in the important advantage that we can use a big toolbox of algorithms for estimating structured probabilistic models, such as the expectation maximization algorithm (Dempster et al, 1977) or variational inference (Neal and Hinton, 1998) It does not require a user specified learning rate. They fit a Gaussian Process model to represent the policy of this hidden state The advantages of these imitation learning approaches is that we can estimate the temporal structure of the modular control policy, i.e., when to switch from one building block to the next. Estimating the duration of the building blocks from the given trajectory data seems to be a fruitful and more general approach

INFORMATION THEORETIC POLICY SEARCH FOR LEARNING MODULAR CONTROL POLICIES

LEARNING TO SELECT THE BUILDING BLOCKS

Experimental evaluation of the selection of building blocks robot tetherball

LEARNING TO SEQUENCE THE BUILDING BLOCKS

PROBABILISTIC MOVEMENT PRIMITIVES

PROBABILISTIC TRAJECTORY REPRESENTATION

Adaptation of the building blocks by conditioning

Combination and blending by multiplying distributions

CONCLUSION AND FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in computational neuroscience	Publication Date: Jun 11, 2014
Citations: 21	License type: cc-by

R Discovery Prime

R Discovery Prime

Learning modular policies for robotics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in computational neuroscience

Lead the way for us

Similar Papers

Using the modular controller architecture (MCA) in ambient assisted living
J Koch ... M Anastasopoulos
-
J Koch, et. al.J Koch ... M Anastasopoulos
01 Jan 2007
01 Jan 2007

Team Development Manual
Mike Woodcock
-
Mike WoodcockMike Woodcock
02 Mar 2017
02 Mar 2017

Dissipative Self-Assembly of Dynamic Multicompartmentalized Microsystems with Light-Responsive Behaviors
Gong Cheng ... Juan Perez-Mercader
Chem | VOL. 6
Gong Cheng, et. al.Gong Cheng ... Juan Perez-Mercader
13 Mar 2020
Chem | VOL. 6

Conceptual Synthesis of Compliance at a Single Point
Charles Kim ... Yong-Mo Moon
-
Charles Kim, et. al.Charles Kim ... Yong-Mo Moon
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning modular policies for robotics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in computational neuroscience