Dopamine transients encode reward prediction errors independent of learning rates.

Andrew Mah,Carla E M Golden,Christine M Constantinople

doi:10.1101/2024.04.18.590090

Abstract

Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented in synaptic weights in the striatum, and updated by dopamine-dependent plasticity, suggesting that dopamine release might reflect the product of the learning rate and RPE. Here, we leveraged the fact that animals learn faster in volatile environments to characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc). We trained rats on a task with semi-observable states offering different rewards, and rats adjusted how quickly they initiated trials across states using RPEs. Computational modeling and behavioral analyses showed that learning rates were higher following state transitions, and scaled with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encoded RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dopamine transients encode reward prediction errors independent of learning rates.

Abstract

Talk to us

Similar Papers

More From: bioRxiv : the preprint server for biology

Lead the way for us

Journal: bioRxiv : the preprint server for biology	Publication Date: Aug 19, 2024
License type: cc-by-nc-nd

Similar Papers

Decision letter: Nucleus accumbens dopamine tracks aversive stimulus duration and prediction but not value or prediction error
Erik Oleson ... Kate M Wassum
-
Erik Oleson, et. al.Erik Oleson ... Kate M Wassum
29 Sep 2022
29 Sep 2022

Author response: On the normative advantages of dopamine and striatal opponency for learning and choice
Alana Jaskir ... Michael J Frank
-
Alana Jaskir, et. al.Alana Jaskir ... Michael J Frank
14 Feb 2023
14 Feb 2023

Estrogenic control of reward prediction errors and reinforcement learning.
Carla E M Golden ... Christine M Constantinople
bioRxiv : the preprint server for biology | VOL. -
Carla E M Golden, et. al.Carla E M Golden ... Christine M Constantinople
25 Sep 2024
bioRxiv : the preprint server for biology | VOL. -

Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.
Nathan F Parker ... Laura M Haetzel
Cell Reports | VOL. 39
Nathan F Parker, et. al.Nathan F Parker ... Laura M Haetzel
01 May 2022
Cell Reports | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dopamine transients encode reward prediction errors independent of learning rates.

Abstract

Talk to us

Similar Papers

More From: bioRxiv : the preprint server for biology