Abstract

The exploration/exploitation tradeoff – pursuing a known reward vs. sampling from lesser known options in the hope of finding a better payoff – is a fundamental aspect of learning and decision making. In humans, this has been studied using multi-armed bandit tasks. The same processes have also been studied using simplified probabilistic reversal learning (PRL) tasks with binary choices. Our investigations suggest that protocols previously used to explore PRL in mice may prove beyond their cognitive capacities, with animals performing at a no-better-than-chance level. We sought a novel probabilistic learning task to improve behavioral responding in mice, whilst allowing the investigation of the exploration/exploitation tradeoff in decision making. To achieve this, we developed a two-lever operant chamber task with levers corresponding to different probabilities (high/low) of receiving a saccharin reward, reversing the reward contingencies associated with levers once animals reached a threshold of 80% responding at the high rewarding lever. We found that, unlike in existing PRL tasks, mice are able to learn and behave near optimally with 80% high/20% low reward probabilities. Altering the reward contingencies towards equality showed that some mice displayed preference for the high rewarding lever with probabilities as close as 60% high/40% low. Additionally, we show that animal choice behavior can be effectively modelled using reinforcement learning (RL) models incorporating learning rates for positive and negative prediction error, a perseveration parameter, and a noise parameter. This new decision task, coupled with RL analyses, advances access to investigate the neuroscience of the exploration/exploitation tradeoff in decision making.

Highlights

  • To survive and thrive in an ever-changing world, both human and non-human animals must make a multitude of rapid decisions about how they interact with, and adapt to, the environment around them in order to optimize gains and minimize losses associated with their behaviors

  • We propose that this modified Probabilistic Reversal Learning (PRL) task with 80%/20% reward probabilities would be suitable in subsequent studies to investigate the neurobiology of reinforcement and reversal learning in mice

  • As in the probabilistic learning phase, animals remained in reversal learning until they achieved two sessions with at least 60 responses, and greater than 80% of lever presses made on the high-rewarding lever

Read more

Summary

INTRODUCTION

To survive and thrive in an ever-changing world, both human and non-human animals must make a multitude of rapid decisions about how they interact with, and adapt to, the environment around them in order to optimize gains and minimize losses associated with their behaviors. Together these data indicate that mice perform this task substantially worse than rats, and in a manner much closer to random responding To account for this comparatively poor performance, other groups have developed simplified mouse probabilistic learning tasks (Ineichen et al, 2012) in which the high value choice maintains an 80% probability of reward, but choosing the low value option will never provide reward. After sufficient training this presentation of the PRL task is one which mice can achieve, with animals making between four and five out of seven possible reversals in a 60 trial session, which is much greater than the 1–2 reversals in 400 trials of mice in the PRL task optimized for rats. We presented animals with increasingly noisy reward probabilities (70%/30% and 60%/40% on high/low rewarding levers, respectively) and found that some mice were able to discriminate and reverse in these more complex, uncertain environments, illustrating access to a cognitive gradient within the task

MATERIALS AND METHODS
DISCUSSION
ETHICS STATEMENT
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call