Abstract

Decision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter. Therefore, we developed a novel two-level contingency change task in which transition contingencies between states change every few trials; MB and MF control predict different responses following these contingency changes, allowing their relative influence to be inferred. Additionally, we manipulated the rate of contingency changes in order to determine whether contingency change volatility would play a role in shifting subjects between a MB and MF strategy. We found that human subjects employed a hybrid MB/MF strategy on the task, corroborating the parallel contribution of MB and MF systems in reinforcement learning. Further, subjects did not remain at one level of MB/MF behaviour but rather displayed a shift towards more MB behavior over the first two blocks that was not attributable to the rate of contingency changes but rather to the extent of training. We demonstrate that flexibility to contingency changes can distinguish MB and MF strategies, with human subjects utilizing a hybrid strategy that shifts towards more MB behavior over blocks, consequently corresponding to a higher payoff.

Highlights

  • For optimal decision-making, animals must learn to associate the choices they make with the outcomes that arise from them

  • Actions can lead to outcomes that change in value – one day, your favorite food is poorly made and less pleasant

  • We found that human subjects showed a hybrid strategy in reacting to contingency changes in our task, with an increased influence of MB control over the first two blocks

Read more

Summary

Introduction

For optimal decision-making, animals must learn to associate the choices they make with the outcomes that arise from them. Classical learning theories suggest that this problem is addressed by habitual or goal-directed strategies for reinforcement learning [1, 2]. These strategies differ in that habitual behavior seeks to reinforce responses based on environmental cues, whereas goal-directed behavior considers action-outcome relationships – that is, contingencies–in the environment. Habitual and goal-directed strategies have been implemented in model-based (MB) and model-free (MF) reinforcement learning algorithms, respectively. Both algorithms make decisions by estimating action values and choosing the actions that maximize reward in the long term [3, 4]. The MF system achieves this retrospectively, caching past rewards using a reward prediction error signal [5] whereas the MB system achieves this prospectively, planning using a learned internal model of the state transitions and rewards in the environment [6,7]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call