Abstract

It has been suggested that the midbrain dopamine (DA) neurons, receiving inputs from the cortico-basal ganglia (CBG) circuits and the brainstem, compute reward prediction error (RPE), the difference between reward obtained or expected to be obtained and reward that had been expected to be obtained. These reward expectations are suggested to be stored in the CBG synapses and updated according to RPE through synaptic plasticity, which is induced by released DA. These together constitute the “DA=RPE” hypothesis, which describes the mutual interaction between DA and the CBG circuits and serves as the primary working hypothesis in studying reward learning and value-based decision-making. However, recent work has revealed a new type of DA signal that appears not to represent RPE. Specifically, it has been found in a reward-associated maze task that striatal DA concentration primarily shows a gradual increase toward the goal. We explored whether such ramping DA could be explained by extending the “DA=RPE” hypothesis by taking into account biological properties of the CBG circuits. In particular, we examined effects of possible time-dependent decay of DA-dependent plastic changes of synaptic strengths by incorporating decay of learned values into the RPE-based reinforcement learning model and simulating reward learning tasks. We then found that incorporation of such a decay dramatically changes the model's behavior, causing gradual ramping of RPE. Moreover, we further incorporated magnitude-dependence of the rate of decay, which could potentially be in accord with some past observations, and found that near-sigmoidal ramping of RPE, resembling the observed DA ramping, could then occur. Given that synaptic decay can be useful for flexibly reversing and updating the learned reward associations, especially in case the baseline DA is low and encoding of negative RPE by DA is limited, the observed DA ramping would be indicative of the operation of such flexible reward learning.

Highlights

  • The midbrain dopamine (DA) neurons receive inputs from many brain regions, among which the basal ganglia (BG) are major sources (Watabe-Uchida et al, 2012)

  • Functional relevance of the decay of synaptic strength has been recently put forward (Hardt et al, 2013, 2014). In light of these findings and suggestions, in the present study we explored through computational modeling whether the observed gradual ramping of DA can be explained by extending the “DA=reward prediction error (RPE)” hypothesis by taking into account such possible decay of plastic changes of the synapses that store learned values. (Please note that we have tried to describe the basic idea of our modeling in the Results so that it can be followed without referring to the Methods.)

  • The discrepancy in the timing could be partially understood given that our model describes the temporal evolution of RPE, which is presumably first represented by the activity of DA neurons whereas the experiments measured the concentration of DA presumably released from these neurons and there is expected to be a time lag, as suggested from the observed difference in latencies of DA neuronal firings (Schultz et al, 1997) and DA concentration changes (Hart et al, 2014)

Read more

Summary

Introduction

The midbrain dopamine (DA) neurons receive inputs from many brain regions, among which the basal ganglia (BG) are major sources (Watabe-Uchida et al, 2012). Released DA induces or significantly modulates plasticity of corticostriatal synapses (Calabresi et al, 1992; Reynolds et al, 2001; Shen et al, 2008) so that the values of stimuli or actions stored in these synapses are updated according to the RPE (Figure 1B) Such a suggested functional reciprocity between the DA neurons and the corticoBG (CBG) circuits, referred to as the “DA=RPE” hypothesis here, has been guiding research on reward/reinforcement learning and value-based decision-making (Montague et al, 2004; O’Doherty et al, 2007; Rangel et al, 2008; Glimcher, 2011)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call