Abstract

Model-based and model-free reinforcement learning (RL) have been suggested as algorithmic realizations of goal-directed and habitual action strategies. Model-based RL is more flexible than model-free but requires sophisticated calculations using a learnt model of the world. This has led model-based RL to be identified with slow, deliberative processing, and model-free RL with fast, automatic processing. In support of this distinction, it has recently been shown that model-based reasoning is impaired by placing subjects under cognitive load—a hallmark of non-automaticity. Here, using the same task, we show that cognitive load does not impair model-based reasoning if subjects receive prior training on the task. This finding is replicated across two studies and a variety of analysis methods. Thus, task familiarity permits use of model-based reasoning in parallel with other cognitive demands. The ability to deploy model-based reasoning in an automatic, parallelizable fashion has widespread theoretical implications, particularly for the learning and execution of complex behaviors. It also suggests a range of important failure modes in psychiatric disorders.

Highlights

  • A wealth of experimental data indicates the brain uses at least two distinct decision making strategies in value-guided choice

  • A compelling computational account of these two control mechanisms draws on reinforcement learning (RL) theory [1]

  • Our central question was whether this shift would be reduced if subjects had prior training on the two-step task

Read more

Summary

Introduction

A wealth of experimental data indicates the brain uses at least two distinct decision making strategies in value-guided choice. One involves prospective reasoning about action-outcome contingencies, while the other retrospectively links rewards to actions [1,2,3] The interplay between these two choice strategies has substantial clinical implications. Prospective reasoning, on the other hand, relies on a learned model of the world to accurately predict the outcomes of actions, even in the face of changing action-reward contingencies [1,7,8]. This is suggested to render model-based reasoning more flexible but at a heightened computational cost [3]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call