Stimulus-dependent adjustment of reward prediction error in the midbrain.

Hiromasa Takemura,Jiro Okuda,Rufin Vogels,Kazuyuki Samejima,Masamichi Sakagami,Jan Lauwereyns

doi:10.1371/journal.pone.0028337

Hiromasa Takemura, Jiro Okuda + Show 4 more

Open Access

https://doi.org/10.1371/journal.pone.0028337

Copy DOI

Journal: PloS one	Publication Date: Dec 2, 2011
Citations: 41	License type: CC BY 4.0

Affiliation: The University of Tokyo, KU Leuven, Tamagawa University

Abstract

Previous reports have described that neural activities in midbrain dopamine areas are sensitive to unexpected reward delivery and omission. These activities are correlated with reward prediction error in reinforcement learning models, the difference between predicted reward values and the obtained reward outcome. These findings suggest that the reward prediction error signal in the brain updates reward prediction through stimulus–reward experiences. It remains unknown, however, how sensory processing of reward-predicting stimuli contributes to the computation of reward prediction error. To elucidate this issue, we examined the relation between stimulus discriminability of the reward-predicting stimuli and the reward prediction error signal in the brain using functional magnetic resonance imaging (fMRI). Before main experiments, subjects learned an association between the orientation of a perceptually salient (high-contrast) Gabor patch and a juice reward. The subjects were then presented with lower-contrast Gabor patch stimuli to predict a reward. We calculated the correlation between fMRI signals and reward prediction error in two reinforcement learning models: a model including the modulation of reward prediction by stimulus discriminability and a model excluding this modulation. Results showed that fMRI signals in the midbrain are more highly correlated with reward prediction error in the model that includes stimulus discriminability than in the model that excludes stimulus discriminability. No regions showed higher correlation with the model that excludes stimulus discriminability. Moreover, results show that the difference in correlation between the two models was significant from the first session of the experiment, suggesting that the reward computation in the midbrain was modulated based on stimulus discriminability before learning a new contingency between perceptually ambiguous stimuli and a reward. These results suggest that the human reward system can incorporate the level of the stimulus discriminability flexibly into reward computations by modulating previously acquired reward values for a typical stimulus.

Highlights

Reward prediction is an important function used by humans and animals to make appropriate decisions in various environments
No other area showed significant difference of the effect size between models. These results demonstrated that the neural activity in the midbrain is correlated significantly with the reward prediction error in the reinforcement learning model including the factor of stimulus discriminability level (WITH model)
Higher correlation with the WITH model was observed consistently for wide range of learning rates we tested, and no area showed higher correlation with the reward prediction error in the WITHOUT model than that with the WITH model. Such a difference of correlation between models appeared from the first session of the experiment. These results support the view that the human reward system can incorporate a level of discriminability of perceptually degraded stimuli for calculating the reward prediction error, by adaptively modulating alreadyacquired reward values for distinctive stimuli according to the stimulus discriminability information related to a stimulus-bystimulus basis

Summary

Introduction

Reward prediction is an important function used by humans and animals to make appropriate decisions in various environments. Computational studies have described these reward prediction error activities using reinforcement learning models such as the Rescorla–Wagner model and the temporal difference (TD) model [2,7,9,10,11,12,13,14]. These results suggest that the reward prediction error signal is represented in the midbrain dopamine neurons and that it is used for updating the association between reward prediction and sensory stimuli

Methods

Results

Conclusion