Abstract

In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. This study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies.

Highlights

  • All organisms need to make choices for their survival while being confronted to uncertainty in their environment

  • We first give here a formal demonstration (Supporting Material) that in a Q-model such as the one we applied to our task, a reduction of the amplitude of phasic dopaminergic responses to rewards directly translates into an increase in random exploration levels

  • The Q-value of the performed action is revised in proportion to the reward prediction error (RPE), so it is exactly a fraction of what it would be in the absence of pharmacological manipulation

Read more

Summary

Introduction

All organisms need to make choices for their survival while being confronted to uncertainty in their environment. We develop such a 3-armed bandit task in rats with varying levels of uncertainty to investigate how dopamine controls the exploration level within an individual We do this by examining the effects of dopamine blockade on learning and performance variables following injection of various doses of flupenthixol, a D1/ D2 receptor antagonist which should cause a reduction in the effect of both tonic and phasic dopamine activity, in different sessions. We follow by replicating these data with a reinforcement learning model (Q-learning) extended with forgetting and verify our conclusions on a variety of alternative models such as a directed exploration model, an ε-greedy random exploration model, and a meta-learning model This allows us to explicitly distinguish learning from exploration variables, and to show that dopamine activity is involved in controlling the level of random exploration rather than the learning rate

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call