Event Abstract Back to Event Discounting as task termination, and its implications Paul Schrater1* and Constantin Rothkopf2 1 University of Minnesota, United States 2 Frankfurt Institute for Advanced Studies, Germany Discounting is a natural part of the formulation of most sequential decisions problems and has been reported to underlie human and animal behavior in numerous empirical studies. Yet, typically discounting is treated as an arbitrary parameter needed to bound the expected future reward, and is treated as a fixed constant for all states. However, rather than an arbitrary parameter, discounting can also be interpreted as the probability of task termination. By assigning individual discount rates to separate states, we show it provides a framework for exploration bonuses, a rational basis for bounded computation and provides a basis for the automatic construction of stochastic options, or macro actions. Our results capitalize on work by Sonin (2008), who shows a generalization of the Gittins index to a Markov chain that allows for state dependent discount rates. This generalized index shows that the exploratory bonus for an option is the reciprocal of the probability of task termination. Moreover, there is a recursive algorithm to compute the generalized index, which produces an ordering of states in terms of the obtainable reward. Furthermore states that are traversed on the way to high-reward states can be eliminated, producing abstract states with new corresponding transition dynamics, and the probability of task-termination on these abstract states. We show how to use this elimination method to construct stochastic options (sub-policies) that find these abstract states, and that the choice between options can be computed via an index function. Using the termination probability formulation, we can derive exploratory incentives for each option and show that the effect of transition uncertainty is to reduce exploration incentive - in particular, incentive is reduced by the uncertainty over the set of next states - effectively a branching factor on the look ahead. This result provides a rational basis for bounding computation in model-based reinforcement learning. We apply the framework to a well-known reinforcement learning problem that is challenging for exploration - the "chain game." Our analysis decomposes the problem into a simple binary choice between two options, given enough experience with the transition probabilities, and we can quantify the difficulty in learning the better option. Humans placed in this environment fell into one of two distinct groups; one group performed enough unrewarded exploratory actions to find the better option while the second group under-explored and found the worst option. In debriefing, subjects in the former group reported finding worse option quickly, but believed that higher rewards were possible and thus continued exploration. Conversely, subjects in the latter group typically reported an initial exploratory phase, but upon finding the worse option believed the rest of the chain wasn’t worth exploring. We believe these results provide new insight into the relationship between abstraction, exploration and the overall chance of task termination, and give a computational basis for understanding why exploration should be tied to competence or effectiveness, i.e. the ability to complete the task. Conference: Computational and Systems Neuroscience 2010, Salt Lake City, UT, United States, 25 Feb - 2 Mar, 2010. Presentation Type: Poster Presentation Topic: Poster session III Citation: Schrater P and Rothkopf C (2010). Discounting as task termination, and its implications. Front. Neurosci. Conference Abstract: Computational and Systems Neuroscience 2010. doi: 10.3389/conf.fnins.2010.03.00135 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 01 Mar 2010; Published Online: 01 Mar 2010. * Correspondence: Paul Schrater, University of Minnesota, Minneapolis, United States, schrater@cs.umn.edu Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Paul Schrater Constantin Rothkopf Google Paul Schrater Constantin Rothkopf Google Scholar Paul Schrater Constantin Rothkopf PubMed Paul Schrater Constantin Rothkopf Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.