Abstract

An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers. The former learns the values of actions directly from past encounters, and the latter exploits a cognitive map of the task to calculate these prospectively. Considerable attention has been paid to how these systems interact during choice, but how and whether knowledge of a cognitive map contributes to the way MF and MB controllers assign credit (i.e., to how they revaluate actions and states following the receipt of an outcome) remains underexplored. Here, we examine such sophisticated credit assignment using a dual-outcome bandit task. We provide evidence that knowledge of a cognitive map influences credit assignment in both MF and MB systems, mediating subtly different aspects of apparent relevance. Specifically, we show MF credit assignment is enhanced for those rewards that are related to a choice, and this contrasted with choice-unrelated rewards that reinforced subsequent choices negatively. This modulation is only possible based on knowledge of task structure. On the other hand, MB credit assignment was boosted for outcomes that impacted on differences in values between offered bandits. We consider mechanistic accounts and the normative status of these findings. We suggest the findings extend the scope and sophistication of cognitive map-based credit assignment during reinforcement learning, with implications for understanding behavioral control.

Highlights

  • An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers

  • In support of our hypothesis that MF credit assignment (MFCA) is guided by a cognitive map (CM), we found evidence that credit for choice-related and -unrelated outcomes is assigned to actions in a different manner

  • Prior to testing our second hypothesis pertaining to importance-based MB credit assignment (MBCA), we show that MBCA occurs for both choice-related and -unrelated vegetables

Read more

Summary

Human subjects exploit a cognitive map for credit assignment

Recent research highlights competitive and cooperative interactions between these systems, including speed accuracy tradeoffs [22], reliability-based arbitration [1, 23], and a plan-to-habit strategy [24], with a focus on a prospective-planning role served by the MB system during choice We demonstrated another influence of a CM (and as we described it there, MB processes) in guiding credit assignment (CA) to MF action-values (i.e., affecting how MF values of actions and states are updated as reward-outcomes are received) [25]. Credit assignment (CA) to relevant actions poses a challenge because one is often flooded with reward feedback that is not causally attributed We addressed this issue in a reinforcement learning framework wherein choice is mutually controlled by value-caching model-free (MF) and prospective, planning model-based (MB) systems.

Results
Discussion
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call