Abstract

To flexibly adapt to the demands of their environment, animals are constantly exposed to the conflict resulting from having to choose between predictably rewarding familiar options (exploitation) and risky novel options, the value of which essentially consists of obtaining new information about the space of possible rewards (exploration). Despite extensive research, the mechanisms that subtend the manner in which animals solve this exploitation-exploration dilemma are still poorly understood. Here, we investigate human decision-making in a gambling task in which the informational value of each trial and the reward potential were separately manipulated. To better characterize the mechanisms that underlined the observed behavioural choices, we introduce a computational model that augments the standard reward-based reinforcement learning formulation by associating a value to information. We find that both reward and information gained during learning influence the balance between exploitation and exploration, and that this influence was dependent on the reward context. Our results shed light on the mechanisms that underpin decision-making under uncertainty, and suggest new approaches for investigating the exploration-exploitation dilemma throughout the animal kingdom.

Highlights

  • To flexibly adapt to the demands of their environment, animals are constantly exposed to the conflict resulting from having to choose between predictably rewarding familiar options and risky novel options, the value of which essentially consists of obtaining new information about the space of possible rewards

  • The kRL model predicts more frequent selection of the never-experienced deck (0seen) whereas standard Reinforcement Learning (sRL) predicts to more frequent selection of options associated with highest expected reward

  • Pairwise comparison using paired t-test showed significant differences between all comparisons with all p values

Read more

Summary

Introduction

To flexibly adapt to the demands of their environment, animals are constantly exposed to the conflict resulting from having to choose between predictably rewarding familiar options (exploitation) and risky novel options, the value of which essentially consists of obtaining new information about the space of possible rewards (exploration). Some studies failed to observe directed exploration in humans[5,6,7], possibly because of using tasks (i.e., “bandit tasks”) that make it difficult to separately identify directed and random exploratory strategies[8] due to rise in information/reward confound[2]. To overcome this limitation, Wilson et al.[2]. The process is iterated for the length of free-choice task and at the beginning of each game expected reward values are initialized to Q0 and information to zero

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call