Event Abstract Back to Event Value function uncertainty as a cognitive map for reinforcement learning Temporal-difference (TD) reinforcement learning (RL) methods underlie prominent accounts of dopamine neuron spiking. However, it has long been known that these theories are not, by themselves, an adequate account of animal conditioning behavior. A key challenge for such theories is Tolman's (1932) demonstration of "latent learning" in spatial tasks: rats are faster at learning to traverse a maze to obtain food in a particular location, if they have previously been exposed to the maze without reward. This phenomenon is normally understood to suggest that rats learn a representation of the spatial configuration of the maze (a "cognitive map") during preexposure and use it to plan actions toward a subsequently discovered goal. This is consistent with "model-based" RL methods, but not with standard TD algorithms, which are "model-free" in the sense that they do not represent any information about task contingencies, such as a spatial map, but instead learn only a value function measuring the proximity of states (e.g., maze locations) to reward. These methods, accordingly, learn nothing during maze preexposure, and exhibit no latent learning. Because of these and similar experiments, it has been proposed that the purported model-free dopaminergic RL system is accompanied by a separate, more cognitive model-based RL planning system (Daw et al 2005). Here we reconsider these issues in the context of Bayesian versions of TD, which instead of maintaining a point estimate of the value function, use Bayes' theorem to maintain a distribution over values. In particular, we consider a theory based on Gaussian Process TD (Engel et al 2003), which represents uncertainty about states' values not just for each state separately, but instead jointly using a full state-state covariance matrix. We show that with learning, the structure of the posterior covariance captures the transition dynamics of the task (eg, states' spatial proximity), like a cognitive map, and that this information facilitates subsequent learning. In simulations, the covariance learned during preexposure allows the model to reproduce the latent learning effect because it enables a single subsequent experience with reward at the goal to update the value estimates for all states in the maze. These findings forge an unexpected connection between research on how uncertainty modulates learning in conditioning (extending Kakade & Dayan's, 2000, account of retrospective revaluation) and other work on accelerating learning in RL using basis functions that allow experience to generalize between "nearby" states. In particular, we demonstrate a formal relationship between the posterior value covariance and the "successor representation" basis for generalization in TD (Dayan 1993). Similarly, the results suggest that cognitive maps (and neural systems thought to subserve them, like the hippocampus) may be interpreted in terms of uncertainty as well as spatial representation, and may allow knowledge about task structure to be integrated with value estimates in a way that combines the strengths of both model-free and model-based RL approaches. Conference: Computational and systems neuroscience 2009, Salt Lake City, UT, United States, 26 Feb - 3 Mar, 2009. Presentation Type: Poster Presentation Topic: Poster Presentations Citation: (2009). Value function uncertainty as a cognitive map for reinforcement learning. Front. Syst. Neurosci. Conference Abstract: Computational and systems neuroscience 2009. doi: 10.3389/conf.neuro.06.2009.03.105 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 02 Feb 2009; Published Online: 02 Feb 2009. Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Google Google Scholar PubMed Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.