Bayesian Version Research Articles

Event Abstract Back to Event Value function uncertainty as a cognitive map for reinforcement learning Temporal-difference (TD) reinforcement learning (RL) methods underlie prominent accounts of dopamine neuron spiking. However, it has long been known that these theories are not, by themselves, an adequate account of animal conditioning behavior. A key challenge for such theories is Tolman's (1932) demonstration of "latent learning" in spatial tasks: rats are faster at learning to traverse a maze to obtain food in a particular location, if they have previously been exposed to the maze without reward. This phenomenon is normally understood to suggest that rats learn a representation of the spatial configuration of the maze (a "cognitive map") during preexposure and use it to plan actions toward a subsequently discovered goal. This is consistent with "model-based" RL methods, but not with standard TD algorithms, which are "model-free" in the sense that they do not represent any information about task contingencies, such as a spatial map, but instead learn only a value function measuring the proximity of states (e.g., maze locations) to reward. These methods, accordingly, learn nothing during maze preexposure, and exhibit no latent learning. Because of these and similar experiments, it has been proposed that the purported model-free dopaminergic RL system is accompanied by a separate, more cognitive model-based RL planning system (Daw et al 2005). Here we reconsider these issues in the context of Bayesian versions of TD, which instead of maintaining a point estimate of the value function, use Bayes' theorem to maintain a distribution over values. In particular, we consider a theory based on Gaussian Process TD (Engel et al 2003), which represents uncertainty about states' values not just for each state separately, but instead jointly using a full state-state covariance matrix. We show that with learning, the structure of the posterior covariance captures the transition dynamics of the task (eg, states' spatial proximity), like a cognitive map, and that this information facilitates subsequent learning. In simulations, the covariance learned during preexposure allows the model to reproduce the latent learning effect because it enables a single subsequent experience with reward at the goal to update the value estimates for all states in the maze. These findings forge an unexpected connection between research on how uncertainty modulates learning in conditioning (extending Kakade & Dayan's, 2000, account of retrospective revaluation) and other work on accelerating learning in RL using basis functions that allow experience to generalize between "nearby" states. In particular, we demonstrate a formal relationship between the posterior value covariance and the "successor representation" basis for generalization in TD (Dayan 1993). Similarly, the results suggest that cognitive maps (and neural systems thought to subserve them, like the hippocampus) may be interpreted in terms of uncertainty as well as spatial representation, and may allow knowledge about task structure to be integrated with value estimates in a way that combines the strengths of both model-free and model-based RL approaches. Conference: Computational and systems neuroscience 2009, Salt Lake City, UT, United States, 26 Feb - 3 Mar, 2009. Presentation Type: Poster Presentation Topic: Poster Presentations Citation: (2009). Value function uncertainty as a cognitive map for reinforcement learning. Front. Syst. Neurosci. Conference Abstract: Computational and systems neuroscience 2009. doi: 10.3389/conf.neuro.06.2009.03.105 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 02 Feb 2009; Published Online: 02 Feb 2009. Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Google Google Scholar PubMed Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

The problem addressed in this paper is “the main epistemic problem concerning science”, viz. “the explication of how we compare and evaluate theories [...] in the light of the available evidence” (van Fraassen, BC, 1983, Theory comparison and relevant Evidence. In J. Earman (Ed.), Testing scientific theories (pp. 27–42). Minneapolis: University of Minnesota Press). Sections 1– 3 contain the general plausibility-informativeness theory of theory assessment. In a nutshell, the message is (1) that there are two values a theory should exhibit: truth and informativeness—measured respectively by a truth indicator and a strength indicator; (2) that these two values are conflicting in the sense that the former is a decreasing and the latter an increasing function of the logical strength of the theory to be assessed; and (3) that in assessing a given theory by the available data one should weigh between these two conflicting aspects in such a way that any surplus in informativeness succeeds, if the shortfall in plausibility is small enough. Particular accounts of this general theory arise by inserting particular strength indicators and truth indicators. In Section 4 the theory is spelt out for the Bayesian paradigm of subjective probabilities. It is then compared to incremental Bayesian confirmation theory. Section 4 closes by asking whether it is likely to be lovely. Section 5 discusses a few problems of confirmation theory in the light of the present approach. In particular, it is briefly indicated how the present account gives rise to a new analysis of Hempel’s conditions of adequacy for any relation of confirmation (Hempel, CG, 1945, Studies in the logic of comfirmation. Mind, 54, 1–26, 97–121.), differing from the one Carnap gave in § 87 of his Logical foundations of probability (1962, Chicago: University of Chicago Press). Section 6 adresses the question of justification any theory of theory assessment has to face: why should one stick to theories given high assessment values rather than to any other theories? The answer given by the Bayesian version of the account presented in section 4 is that one should accept theories given high assessment values, because, in the medium run, theory assessment almost surely takes one to the most informative among all true theories when presented separating data. The concluding section 7 continues the comparison between the present account and incremental Bayesian confirmation theory.

Bayesian Version Research Articles

Related Topics

Articles published on Bayesian Version

Block Gibbs Sampling for Bayesian Random Effects Models With Improper Priors: Convergence and Regeneration

Value function uncertainty as a cognitive map for reinforcement learning

Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems

NIMBLE: A kernel density model of saccade-based visual memory

Weather Derivative Pricing and the Modelling of Trends: Objective Bayesian Versions of the Flat-Line, Linear Trend and Damped Linear Trend Models

Quantum Bayesianism: A study

Bayesian semi parametric multi-state models

Simultaneous probability statements for Bayesian P-splines

Head and Neck Squamous-Cell Cancer and its Association with Polymorphic Enzymes of Xenobiotic Metabolism and Repair

Bayesian LASSO for Quantitative Trait Loci Mapping

The Bayesian Lasso

Bayesian computation for statistical models with intractable normalizing constants

Combining wavelet‐based feature extractions with relevance vector machines for stock index forecasting

Long-term HIV dynamic models incorporating drug adherence and resistance to treatment for prediction of virological responses

Statistical Prediction of the Outcome of a Noncooperative Game

A Bayesian Approach to Adaptive Detection in Nonhomogeneous Environments

Host-Mediated Inflammation Disrupts the Intestinal Microbiota and Promotes the Overgrowth of Enterobacteriaceae

Knowledge-Aided Bayesian Detection in Heterogeneous Environments

Assessing theories, Bayes style

Bayesian Age-Period-Cohort Modeling and Prediction -BAMP

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bayesian Version Research Articles

Related Topics

Articles published on Bayesian Version

Block Gibbs Sampling for Bayesian Random Effects Models With Improper Priors: Convergence and Regeneration

Value function uncertainty as a cognitive map for reinforcement learning

Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems

NIMBLE: A kernel density model of saccade-based visual memory

Weather Derivative Pricing and the Modelling of Trends: Objective Bayesian Versions of the Flat-Line, Linear Trend and Damped Linear Trend Models

Quantum Bayesianism: A study

Bayesian semi parametric multi-state models

Simultaneous probability statements for Bayesian P-splines

Head and Neck Squamous-Cell Cancer and its Association with Polymorphic Enzymes of Xenobiotic Metabolism and Repair

Bayesian LASSO for Quantitative Trait Loci Mapping

The Bayesian Lasso

Bayesian computation for statistical models with intractable normalizing constants

Combining wavelet‐based feature extractions with relevance vector machines for stock index forecasting

Long-term HIV dynamic models incorporating drug adherence and resistance to treatment for prediction of virological responses

Statistical Prediction of the Outcome of a Noncooperative Game

A Bayesian Approach to Adaptive Detection in Nonhomogeneous Environments

Host-Mediated Inflammation Disrupts the Intestinal Microbiota and Promotes the Overgrowth of Enterobacteriaceae

Knowledge-Aided Bayesian Detection in Heterogeneous Environments

Assessing theories, Bayes style

Bayesian Age-Period-Cohort Modeling and Prediction -BAMP