Abstract

Event Abstract Back to Event The Human Brain Computes Two Different Prediction Errors Reinforcement learning (RL) provides a framework involving two diverse approaches to reward-based decision making: model-free RL assesses candidate actions by directly learning their expected long-term reward consequences using a reward prediction error (RPE), whereas model-based RL uses experience with the sequential occurrence of situations (‘states’) to build a model of the state transition and outcome structure of the environment and then searches forward in it to evaluate actions. This latter, model-based approach requires a state prediction error (SPE), which trains predictions about the transitions between different states in the world rather than about sum future rewards. Eighteen human subjects performed a probabilistic Markov decision task while being scanned with functional magnetic resonance imaging. The task required subjects to make two sequential choices, the first leading them probabilistically into an intermediary state, and the second into one of three outcome states enjoying different reward magnitudes. In an attempt to dissociate model-based from model-free RL completely, we exposed our subjects in the first scanning session just to transitions in the state space in complete absence of rewards or free choices. This permits a pure assessment of the model-building aspects of model-based RL, because model-free RL cannot learn about future expected rewards in their absence, and the RPE is therefore nil. Prior to the second, free-choice session subjects were exposed to the rewards that would be available at the outcome state. Our subjects demonstrated the essential model-based RL competence of combining the information about the structure of the state space with the reward information, by making more optimal choices at the beginning of the free-choice session than would have been expected by chance (p<0.05, sign test, one-tailed). In order to assess the neural signatures of these two error signals, we formalized the computational approaches as trial-by-trial mathematical models and determined free parameters by fitting the model to behavioral choices. An RPE and an SPE were derived from a model-free SARSA learner and a model-based FORWARD (model) learner respectively. Choices from the FORWARD learner were computed using dynamic programming. A combined model was derived by weighting the choice preferences of SARSA and FORWARD; the relative influence of the latter was found to decrease over trials. The trial-by-trial reward and state error signals derived from the two model components were included in the analysis of the imaging data in order to seek their correlations with neural signals. We found evidence of a neural state prediction error in addition to the previously well characterized RPE. The SPE was present bilaterally in intraparietal sulcus (IPS) and lateral prefrontal cortex (latPFC), and was clearly dissociable from the RPE located predominantly in ventral striatum (all regions p<0.05, whole-brain correction). Importantly, the left latPFC and right IPS also correlated with the SPE during the non-rewarded first session, underlining their importance in pure state space learning. These findings provide evidence for the existence of two unique forms of learning signals in humans, which may form the basis of distinct computational strategies for guiding behavior. Conference: Computational and systems neuroscience 2009, Salt Lake City, UT, United States, 26 Feb - 3 Mar, 2009. Presentation Type: Poster Presentation Topic: Poster Presentations Citation: (2009). The Human Brain Computes Two Different Prediction Errors. Front. Syst. Neurosci. Conference Abstract: Computational and systems neuroscience 2009. doi: 10.3389/conf.neuro.06.2009.03.270 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 04 Feb 2009; Published Online: 04 Feb 2009. Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Google Google Scholar PubMed Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call