Abstract

Learning appropriate representations of the reward environment is challenging in the real world where there are many options, each with multiple attributes or features. Despite existence of alternative solutions for this challenge, neural mechanisms underlying emergence and adoption of value representations and learning strategies remain unknown. To address this, we measure learning and choice during a multi-dimensional probabilistic learning task in humans and trained recurrent neural networks (RNNs) to capture our experimental observations. We find that human participants estimate stimulus-outcome associations by learning and combining estimates of reward probabilities associated with the informative feature followed by those of informative conjunctions. Through analyzing representations, connectivity, and lesioning of the RNNs, we demonstrate this mixed learning strategy relies on a distributed neural code and opponency between excitatory and inhibitory neurons through value-dependent disinhibition. Together, our results suggest computational and neural mechanisms underlying emergence of complex learning strategies in naturalistic settings.

Highlights

  • Learning appropriate representations of the reward environment is challenging in the real world where there are many options, each with multiple attributes or features

  • We find that participants estimate stimulusoutcome associations by learning and combining estimates of reward probabilities associated with the informative feature followed by those of informative conjunctions, and this behavior is replicated by the trained recurrent neural networks (RNNs)

  • We show that the observed mixed learning strategy relies on a distributed neural code and distinct contributions of excitatory and inhibitory neurons

Read more

Summary

Introduction

Learning appropriate representations of the reward environment is challenging in the real world where there are many options, each with multiple attributes or features. To be able to learn from an unpleasant reaction to consuming a multi-ingredient meal requires having representations for reward value (i.e., subjective reward experience associated with selection and consumption) and/or predictive value of certain individual ingredients or combinations of ingredients that resulted in the outcome (informative attributes) Learning such informative attributes and associated value representations is challenging because feedback is non-specific and scarce (e.g., stomachache after a meal with combinations of ingredients that may never recur), and it is unclear what attributes or combinations of attributes are important for predicting the outcomes and must be learned[1]. Distributed representations allow for more flexibility, making them plausible candidates for learning appropriate representations in highdimensional reward environments It is currently unknown how multiple value representations and learning strategies emerge over time and what the underlying neural mechanisms are

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.