Abstract

Substantial evidence suggests that the phasic activity of dopamine neurons represents reinforcement learning’s temporal difference prediction error. However, recent reports of ramp-like increases in dopamine concentration in the striatum when animals are about to act, or are about to reach rewards, appear to pose a challenge to established thinking. This is because the implied activity is persistently predictable by preceding stimuli, and so cannot arise as this sort of prediction error. Here, we explore three possible accounts of such ramping signals: (a) the resolution of uncertainty about the timing of action; (b) the direct influence of dopamine over mechanisms associated with making choices; and (c) a new model of discounted vigour. Collectively, these suggest that dopamine ramps may be explained, with only minor disturbance, by standard theoretical ideas, though urgent questions remain regarding their proximal cause. We suggest experimental approaches to disentangling which of the proposed mechanisms are responsible for dopamine ramps.

Highlights

  • Ideas from the field of reinforcement learning (RL) have played an important role in neuroscientific theories of how animals choose actions to gain rewards and avoid punishments

  • Theory and experiments suggest that activity of dopamine-containing neurons resembles a temporallysophisticated prediction error used to learn expectations of future reward

  • This account would appear to be inconsistent with recent observations of ‘ramps’, i.e., gradual increases in extracellular dopamine concentration prior to the execution of actions or the acquisition of rewards

Read more

Summary

Introduction

Ideas from the field of reinforcement learning (RL) have played an important role in neuroscientific theories of how animals choose actions to gain rewards and avoid punishments. It has been suggested [1, 2] that the phasic responses of midbrain dopaminergic neurons resemble a temporal difference (TD) error, a learning signal which facilitates prediction and control of rewarding events [3, 4]. Recent reports of ramp-like increases in dopamine concentration preceding selfinitiated instrumental responses [8,9,10,11,12,13] and during approach to spatial locations associated with reward [14] appear to pose a challenge to established thinking. The central issue for TD accounts of dopamine is why such ramping should be observed at all, since TD provides a mechanism for predicting away later dopaminergic activity by earlier—as in the case of the transfer of activity from the time of reward to the time of predictive cues

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call