Abstract

Learning from reward feedback is essential for survival but can become extremely challenging with myriad choice options. Here, we propose that learning reward values of individual features can provide a heuristic for estimating reward values of choice options in dynamic, multi-dimensional environments. We hypothesize that this feature-based learning occurs not just because it can reduce dimensionality, but more importantly because it can increase adaptability without compromising precision of learning. We experimentally test this hypothesis and find that in dynamic environments, human subjects adopt feature-based learning even when this approach does not reduce dimensionality. Even in static, low-dimensional environments, subjects initially adopt feature-based learning and gradually switch to learning reward values of individual options, depending on how accurately objects’ values can be predicted by combining feature values. Our computational models reproduce these results and highlight the importance of neurons coding feature values for parallel learning of values for features and objects.

Highlights

  • Learning from reward feedback is essential for survival but can become extremely challenging with myriad choice options

  • The object-based learner directly estimates the reward values of individual objects via reward feedback, whereas the feature-based learner estimates the reward values of all feature instances, such as red, blue, square, or triangle. The latter is achieved by updating the reward values associated with all features of the object for which reward feedback is given

  • To examine how the performance of the two learners depends on the reward statistics in the environment, we varied the relationship between the reward value of each object and the reward values of its features in order to generate multiple environments, each with a different level of generalizability

Read more

Summary

Introduction

Learning from reward feedback is essential for survival but can become extremely challenging with myriad choice options. Learning from reward feedback is essential for survival but can be extremely challenging in natural settings because choices have many features (e.g., color, shape, and texture), each of which can take different values, resulting in a large number of options for which reward values must be learned This is referred to as the “curse of dimensionality,” because the standard reinforcement learning (RL) models used to simulate human learning do not scale up with the increasing dimensionality[1,2,3,4]. A child could evaluate fruits based on their color and texture and learn about these features when she consumes them This heuristic feature-based learning is only beneficial if a generalizable set of rules exist to construct the reward value of all options accurately by combining the reward values of their features. Could the benefits of feature-based learning overcome a lack of generalizable rules and still make this learning approach a viable heuristic? Currently, there is no single, unified framework for describing how such properties of the environment influence learning strategy (e.g., feature based vs. object based)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.