Abstract

Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.

Highlights

  • We focus on a common situation wherein a goaldirected Reinforcement Learning (RL) agent can choose or design an environment within which it will later seek rewards

  • Previous evidence showing that biological agents rely on dual MF-MB systems [7,20] raises questions as to the nature and extent of system-interactions that govern overt behaviour

  • An extensive RL literature suggests these interactions are governed by diverse processes including a speed accuracy trade-off [29], trainer-actor dichotomy[8,30], MF reinforcement of MB-goals [13], reliability-based arbitration [31] and retrospective MB inference guiding MF credit assignment [14]

Read more

Summary

Introduction

Model based planners reflect on their model-free propensities in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call