Learning hierarchical controllers through reinforcement learning

Triesch Jochen

doi:10.3389/conf.fncom.2010.51.00017

Abstract

Event Abstract Back to Event Learning hierarchical controllers through reinforcement learning Hazem Toutounji1*, Constantin A. Rothkopf1 and Jochen Triesch1 1 Johann Wolfgang Goethe University, Frankfurt Institute for Advanced Studies, Germany Reinforcement Learning (RL) comprises a set of algorithms for learning to solve the optimal control problem, i.e. carrying out actions in order to maximize collected reward. But it is expensive in storage, computation, and learning time to code every possible dimension relevant for potential tasks. There is nonetheless consistent evidence, including electrophysiological recordings in mammals and fMRI research in humans, that the brain computes with variables present in temporal difference (TD) algorithms of RL [1], which might suggest the existence of learning algorithms that are capable of reducing the dimensionality of the problem and hence the learning time. For independent action spaces one can learn independent controllers for separate variables describing the full state space. But the question arises how to handle the case in which transition dynamics of multiple controllers are not independent from one another.Here we consider the case in which actions of one controller affect the transition dynamics of a second controller, as commonly occurs in motor control. As a simple example we consider a 2-dimensional state space, and starting from a random initial condition, a learner is attempting to reach a goal state where the reward is maximal. The reward decays quadratically from the goal state until it reaches a minimum value. At each state the learner can perform any possible 2-dimensional action. The transition dynamics in the horizontal direction depends on the magnitude of the action in the vertical direction since a bigger movement in the vertical direction results in cross-talk, i.e. a bigger overshoot in the horizontal direction.To solve this task we propose and implement a solution that assumes that each dimension is initially learned with an independent controller. A third controller learns to coordinate these two controllers by performing corrective actions. Importantly, this third controller learns a policy in the horizontal direction for the whole 2-dimensional state space but with a lower resolution. Learning is triggered in all the controllers by a temporal difference error in predicting the reward and uses the SARSA algorithm [2]. The upper controller can start learning either after the lower controllers had already learned a value function or right from the beginning.In both cases, the multiple controller solution learns significantly faster than the controller using the full joint state space. Furthermore, for very large spaces only the composite controllers are able to find a solution in the allotted time. Also, for the 2-dimensional state spaces of size n2, storage needed for value functions was reduced from O(n)4 in the full controller case to O(n)2 in the hierarchical case. Thus, we show that a hierarchical system, in which lower level controllers have dependent transition dynamics that are accounted for with an upper low-resolution coordinating controller, is capable of learning a near-optimal solution while saving computation time, storage space, and learning time. References Niv 2009, Journal of Mathematical Psychology 53, 139–154 Rummery&Niranjan 1994, Tech Rep Keywords: computational neuroscience Conference: Bernstein Conference on Computational Neuroscience, Berlin, Germany, 27 Sep - 1 Oct, 2010. Presentation Type: Presentation Topic: Bernstein Conference on Computational Neuroscience Citation: Toutounji H, Rothkopf CA and Triesch J (2010). Learning hierarchical controllers through reinforcement learning. Front. Comput. Neurosci. Conference Abstract: Bernstein Conference on Computational Neuroscience. doi: 10.3389/conf.fncom.2010.51.00017 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 22 Sep 2010; Published Online: 23 Sep 2010. * Correspondence: Dr. Hazem Toutounji, Johann Wolfgang Goethe University, Frankfurt Institute for Advanced Studies, Frankfurt, Germany, hazem.toutounji@nottingham.ac.uk Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Hazem Toutounji Constantin A Rothkopf Jochen Triesch Google Hazem Toutounji Constantin A Rothkopf Jochen Triesch Google Scholar Hazem Toutounji Constantin A Rothkopf Jochen Triesch PubMed Hazem Toutounji Constantin A Rothkopf Jochen Triesch Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

Full Text