Abstract

Learning the structure of the world can be driven by reinforcement but also occurs incidentally through experience. Reinforcement learning theory has provided insight into how prediction errors drive updates in beliefs but less attention has been paid to the knowledge resulting from such learning. Here we contrast associative structures formed through reinforcement and experience of task statistics. BOLD neuroimaging in human volunteers demonstrates rigid representations of rewarded sequences in temporal pole and posterior orbito-frontal cortex, which are constructed backwards from reward. By contrast, medial prefrontal cortex and a hippocampal-amygdala border region carry reward-related knowledge but also flexible statistical knowledge of the currently relevant task model. Intriguingly, ventral striatum encodes prediction error responses but not the full RL- or statistically derived task knowledge. In summary, representations of task knowledge are derived via multiple learning processes operating at different time scales that are associated with partially overlapping and partially specialized anatomical regions.

Highlights

  • Learning the structure of the world can be driven by reinforcement and occurs incidentally through experience

  • Participants should know the end of the sequence before its beginning, and associations learnt via Reinforcement learning (RL) should comprise stimulus–stimulus (e.g., A to B) as well as stimulus–reward relationships (e.g., D to Reward)

  • We investigated the associative structures formed through reinforcement and incidental learning mechanisms and describe two neural circuits with distinct coding schemes (Fig. 7)

Read more

Summary

Introduction

Learning the structure of the world can be driven by reinforcement and occurs incidentally through experience. We contrast associative structures formed through reinforcement and experience of task statistics. While model-free and model-based learning can be distinguished at a behavioral level, it has been difficult to associate them with different neural structures or processes. In order to do this, participants learned not just associations between a single stimulus and reward but between chains of stimuli leading to reward[2,18,19] as well as the statistical relationships between stimuli regardless of reward This allowed us to test whether the associative structures derived from different learning processes might prove more distinguishable than their PEs. We undertook two main series of analyses of behavior and neural activity that contrasted knowledge learned from RL versus statistical relationships. We tested whether RL-acquired knowledge may be static and inflexible[20] compared to the cognitive maps formed flexibly through statistical learning[21]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call