Abstract

A promising idea for scaling robot learning to more complex tasks is to use elemental behaviors as building blocks to compose more complex behavior. Ideally, such building blocks are used in combination with a learning algorithm that is able to learn to select, adapt, sequence and co-activate the building blocks. While there has been a lot of work on approaches that support one of these requirements, no learning algorithm exists that unifies all these properties in one framework. In this paper we present our work on a unified approach for learning such a modular control architecture. We introduce new policy search algorithms that are based on information-theoretic principles and are able to learn to select, adapt and sequence the building blocks. Furthermore, we developed a new representation for the individual building block that supports co-activation and principled ways for adapting the movement. Finally, we summarize our experiments for learning modular control architectures in simulation and with real robots.

Highlights

  • Robot learning approaches such as policy search methods (Kober and Peters, 2010; Kormushev et al, 2010; Theodorou et al, 2010) have been very successful. Kormushev et al (2010) Learned to flip pan-cakes and Kober and Peters (2010) Learned the game ballin-the-cup

  • PROBABILISTIC MOVEMENT PRIMITIVES In the second part of this paper, we investigate new representations for the individual building blocks of movements that are suited to be used in a modular control architecture

  • As we focused on the representation of the individual building blocks, we evaluated the new representation without the use of reinforcement learning and learned the Probabilistic Movement Primitive (ProMP) by imitation

Read more

Summary

INTRODUCTION

Robot learning approaches such as policy search methods (Kober and Peters, 2010; Kormushev et al, 2010; Theodorou et al, 2010) have been very successful. Kormushev et al (2010) Learned to flip pan-cakes and Kober and Peters (2010) Learned the game ballin-the-cup. Using a probabilistic model fitting approach to compute the policy update results in the important advantage that we can use a big toolbox of algorithms for estimating structured probabilistic models, such as the expectation maximization algorithm (Dempster et al, 1977) or variational inference (Neal and Hinton, 1998) It does not require a user specified learning rate. They fit a Gaussian Process model to represent the policy of this hidden state The advantages of these imitation learning approaches is that we can estimate the temporal structure of the modular control policy, i.e., when to switch from one building block to the next. Estimating the duration of the building blocks from the given trajectory data seems to be a fruitful and more general approach

INFORMATION THEORETIC POLICY SEARCH FOR LEARNING MODULAR CONTROL POLICIES
LEARNING TO SELECT THE BUILDING BLOCKS
Experimental evaluation of the selection of building blocks robot tetherball
LEARNING TO SEQUENCE THE BUILDING BLOCKS
PROBABILISTIC MOVEMENT PRIMITIVES
PROBABILISTIC TRAJECTORY REPRESENTATION
Adaptation of the building blocks by conditioning
Combination and blending by multiplying distributions
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call