Abstract

For decades, behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive. More recently, many reinforcement learning (RL) models have been developed to explain choice by integrating reward feedback over time. Despite reasonable success of RL models in capturing choice on a trial-by-trial basis, these models cannot capture variability in matching behavior. To address this, we developed metrics based on information theory and applied them to choice data from dynamic learning tasks in mice and monkeys. We found that a single entropy-based metric can explain 50% and 41% of variance in matching in mice and monkeys, respectively. We then used limitations of existing RL models in capturing entropy-based metrics to construct more accurate models of choice. Together, our entropy-based metrics provide a model-free tool to predict adaptive choice behavior and reveal underlying neural mechanisms.

Highlights

  • Behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive

  • We use shortcomings of purely reinforcement learning (RL) models in capturing the pattern of entropy-based metrics in our data to construct multicomponent models that integrate reward- and option-dependent strategies with standard RL models. We show that these models can capture both trial-by-trial choice data and global choice behavior better than the existing models, revealing additional mechanisms involved in adaptive learning and decision making

  • We found that the median predicted ERODSW− was significantly higher than the median observed ERODSW−, suggesting the RL2 model underutilizes loss-dependent and option-dependent strategies when compared to mice and monkeys in our experiments (Fig. 5c, d)

Read more

Summary

Introduction

Behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive. The models could include learning the rewardindependent rate of choosing each option[15], adopting win-stay lose-switch (WSLS) policies[27,28], or learning on multiple timescales[31] These models all provide compelling explanations of the emergence of matching behavior, it remains unclear how they compare in terms of fitting local choice behavior and the extent to which they replicate observed variability in matching behavior. We use shortcomings of purely RL models in capturing the pattern of entropy-based metrics in our data to construct multicomponent models that integrate reward- and option-dependent strategies with standard RL models We show that these models can capture both trial-by-trial choice data and global choice behavior better than the existing models, revealing additional mechanisms involved in adaptive learning and decision making

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.