Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

Ayaka Kato,Kenji Morita

doi:10.1371/journal.pcbi.1005145

Abstract

It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.

Highlights

Electrophysiological [1] and fast-scan cyclic voltammetry (FSCV) [2, 3] studies have conventionally shown that dopamine (DA) neuronal activity and transmitter release respond to unpredicted but not predicted reward, consistent with the suggestion that DA represents reward-prediction-error (RPE) [1, 4]
In the results presented so far, we assumed in the model that RPE is calculated according to a major reinforcement learning (RL) algorithm called Q-learning [28] (Eq (1) in the Materials and Methods), based on the empirical suggestions that DA neuronal activity in the rat ventral tegmental area (VTA) and DA concentration in the nucleus accumbens represent Q-learning-type RPE [21, 29]
The underlying potential mechanisms turned out to be twofold: (1) a valuegradient towards the goal is shaped by value-decay-induced sustained positive RPE, and (2) value-contrasts between ‘Go’ and ‘shows the probability of choosing A11 (Stay)’ are generated because chosen values are continually updated whereas unchosen values decay

Summary

Introduction

Electrophysiological [1] and fast-scan cyclic voltammetry (FSCV) [2, 3] studies have conventionally shown that dopamine (DA) neuronal activity and transmitter release respond to unpredicted but not predicted reward, consistent with the suggestion that DA represents reward-prediction-error (RPE) [1, 4]. As for the underlying synaptic/circuit mechanisms, much progress has been made for the role as RPE but not for the role as motivational drive. How RPE is calculated in the upstream of DA neurons and how released DA implements RPEdependent update of state/action values through synaptic plasticity have become clarified [17,18,19,20]. Both the upstream and downstream mechanisms for DA's motivational role remain more elusive

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: Oct 13, 2016
Citations: 44	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Eyeblink rate, a putative dopamine marker, predicts negative reinforcement learning by tDCS of the dlPFC
Michael Prowacki ... Lee Wei Lim
Brain Stimulation | VOL. 15
Michael Prowacki, et. al.Michael Prowacki ... Lee Wei Lim
25 Feb 2022
Brain Stimulation | VOL. 15

Author response: On the normative advantages of dopamine and striatal opponency for learning and choice
Alana Jaskir ... Michael J Frank
-
Alana Jaskir, et. al.Alana Jaskir ... Michael J Frank
14 Feb 2023
14 Feb 2023

Prefrontal Cortex-Driven Dopamine Signals in the Striatum Show Unique Spatial and Pharmacological Properties.
Martín F Adrover ... Sergi Ferré
The Journal of Neuroscience | VOL. 40
Martín F Adrover, et. al.Martín F Adrover ... Sergi Ferré
28 Aug 2020
The Journal of Neuroscience | VOL. 40

Methamphetamine-induced neurotoxicity disrupts pharmacologically evoked dopamine transients in the dorsomedial and dorsolateral striatum.
John D Robinson ... Kristen A Keefe
Neurotoxicity research | VOL. 26
John D Robinson, et. al.John D Robinson ... Kristen A Keefe
22 Feb 2014
Neurotoxicity research | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology