Abstract
Organisms use rewards to navigate and adapt to (uncertain) environments. Error-based learning about rewards is supported by the dopaminergic system, which is thought to signal reward prediction errors to make adjustments to past predictions. More recently, the phasic dopamine response was suggested to have two components: the first rapid component is thought to signal the detection of a potentially rewarding stimulus; the second, slightly later component characterizes the stimulus by its reward prediction error. Error-based learning signals have also been found for risk. However, whether the neural generators of these signals employ a two-component coding scheme like the dopaminergic system is unknown. Here, using human high density EEG, we ask whether risk learning, or more generally speaking surprise-based learning under uncertainty, is similarly comprised of two temporally dissociable components. Using a simple card game, we show that the risk prediction error is reflected in the amplitude of the P3b component. This P3b modulation is preceded by an earlier component, that is modulated by the stimulus salience. Source analyses are compatible with the idea that both the early salience signal and the later risk prediction error signal are generated in insular, frontal, and temporal cortex. The identified sources are parts of the risk processing network that receives input from noradrenergic cells in the locus coeruleus. Finally, the P3b amplitude modulation is mirrored by an analogous modulation of pupil size, which is consistent with the idea that both the P3b and pupil size indirectly reflect locus coeruleus activity.
Highlights
Reward is crucial to adapt to constantly changing environments
Using human fMRI and pupillometry, we showed that noradrenaline mediated risk prediction error processing resembles the dopamine mediated processing of reward prediction errors (Preuschoff et al, 2008, 2011)
It was proposed that the reward prediction error is encoded by activity of midbrain dopaminergic neurons
Summary
Reward-based learning is well captured by reinforcement learning models (Sutton and Barto, 1998) that use the reward prediction error as a learning signal. The activity of midbrain dopaminergic neurons shows a remarkable correlation with the reward prediction error (see Schultz, 2015, for a review). It was recently discovered that the phasic dopamine response is comprised of two components (see Schultz, 2016, for a review). The sensory component signals the salience of a stimulus, independent of its reward value (Day et al, 2007; Kobayashi and Schultz, 2014). The value component is sensitive to the motivational salience of the stimulus, e.g., the reward prediction error (Waelti et al, 2001)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.