Entropy-based metrics for predicting choice behavior based on local response to reward

Ethan Trepka,Vincent D Costa,Bilal A Bari,Alireza Soltani,Jeremiah Y Cohen,Mehran Spitmaan

doi:10.1038/s41467-021-26784-w

Abstract

For decades, behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive. More recently, many reinforcement learning (RL) models have been developed to explain choice by integrating reward feedback over time. Despite reasonable success of RL models in capturing choice on a trial-by-trial basis, these models cannot capture variability in matching behavior. To address this, we developed metrics based on information theory and applied them to choice data from dynamic learning tasks in mice and monkeys. We found that a single entropy-based metric can explain 50% and 41% of variance in matching in mice and monkeys, respectively. We then used limitations of existing RL models in capturing entropy-based metrics to construct more accurate models of choice. Together, our entropy-based metrics provide a model-free tool to predict adaptive choice behavior and reveal underlying neural mechanisms.

Highlights

Behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive
We use shortcomings of purely reinforcement learning (RL) models in capturing the pattern of entropy-based metrics in our data to construct multicomponent models that integrate reward- and option-dependent strategies with standard RL models. We show that these models can capture both trial-by-trial choice data and global choice behavior better than the existing models, revealing additional mechanisms involved in adaptive learning and decision making
We found that the median predicted ERODSW− was significantly higher than the median observed ERODSW−, suggesting the RL2 model underutilizes loss-dependent and option-dependent strategies when compared to mice and monkeys in our experiments (Fig. 5c, d)

Summary

Introduction

Behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive. The models could include learning the rewardindependent rate of choosing each option[15], adopting win-stay lose-switch (WSLS) policies[27,28], or learning on multiple timescales[31] These models all provide compelling explanations of the emergence of matching behavior, it remains unclear how they compare in terms of fitting local choice behavior and the extent to which they replicate observed variability in matching behavior. We use shortcomings of purely RL models in capturing the pattern of entropy-based metrics in our data to construct multicomponent models that integrate reward- and option-dependent strategies with standard RL models We show that these models can capture both trial-by-trial choice data and global choice behavior better than the existing models, revealing additional mechanisms involved in adaptive learning and decision making

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Nov 12, 2021
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

Entropy-based metrics for predicting choice behavior based on local response to reward

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

Author response: Associability-modulated loss learning is increased in posttraumatic stress disorder
Vanessa M Brown ... B Christopher Frueh
-
Vanessa M Brown, et. al.Vanessa M Brown ... B Christopher Frueh
19 Oct 2017
19 Oct 2017

Author response: DYT1 dystonia increases risk taking in humans
David Arkadir ... Pietro Mazzoni
-
David Arkadir, et. al.David Arkadir ... Pietro Mazzoni
26 Apr 2016
26 Apr 2016

Deep Reinforcement Learning for Automatic Drilling Optimization Using an Integrated Reward Function
Trieu Phat Luu ... Xu Huang
-
Trieu Phat Luu, et. al.Trieu Phat Luu ... Xu Huang
27 Feb 2024
27 Feb 2024

SeaRank: relevance prediction based on click models in a reinforcement learning framework
Amir Hosein Keyhanipour ... Farhad Oroumchian
Data Technologies and Applications | VOL. 57
Amir Hosein Keyhanipour, et. al.Amir Hosein Keyhanipour ... Farhad Oroumchian
08 Sep 2022
Data Technologies and Applications | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Entropy-based metrics for predicting choice behavior based on local response to reward

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications