Abstract
When learning the value of actions in volatile environments, humans often make seemingly irrational decisions that fail to maximize expected value. We reasoned that these 'non-greedy' decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here using reinforcement learning models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stem from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by blood oxygen level-dependent responses to obtained rewards in the dorsal anterior cingulate cortex and by phasic pupillary dilation, suggestive of neuromodulatory fluctuations driven by the locus coeruleus-norepinephrine system. Together, these findings indicate that most behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.