A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

Matthew Balcarras,Thilo Womelsdorf

doi:10.3389/fnins.2016.00125

Matthew Balcarras, Thilo Womelsdorf

Open Access

PDF Available

https://doi.org/10.3389/fnins.2016.00125

Copy DOI

Export

Save

Cite

Journal: Frontiers in neuroscience	Publication Date: Mar 30, 2016
Citations: 2	License type: cc-by

Affiliation: York University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context-specific selections to drive responses.

Highlights

IntroductionSuccessful behavior in new environments benefits from leveraging learning from previous experience in the form of abstract rules—the mapping of contexts, stimuli, actions and outcomes—even though it is often difficult to know which rule is relevant to the current context (Miller, 2000; Gershman et al, 2010a; Buschman et al, 2012; Chumbley et al, 2012; Collins and Frank, 2013; Collins et al, 2014)
We show that average choice behavior across subjects is best explained by a reinforcement learning model that identifies the current task context and applies a selection rule that associates stimuli by feature type and restricts stimulus selection to the relevant stimulus feature
We developed a set of predictive behavioral models using the reinforcement learning framework, which allowed us to fit the choices of each subject to a unique model, separating subjects that utilize advantageous rule-driven behavior from those that do not

Summary

Introduction

Successful behavior in new environments benefits from leveraging learning from previous experience in the form of abstract rules—the mapping of contexts, stimuli, actions and outcomes—even though it is often difficult to know which rule is relevant to the current context (Miller, 2000; Gershman et al, 2010a; Buschman et al, 2012; Chumbley et al, 2012; Collins and Frank, 2013; Collins et al, 2014). There is significant continuity across our every-day decision making contexts that enables positive transfer of previously learned rules, and humans work very hard to pattern our living and working environments in such a way as to provide continuity with contextual cues indicating the relevant rule to apply (Gershman et al, 2010b; Collins et al, 2014). This could be because it is unclear which rule to apply or that an appropriate rule for this context has not been learned

Methods

Results

Conclusion