Abstract
Humans can select actions by learning, planning, or retrieving motor memories. Reinforcement Learning (RL) associates these processes with three major classes of strategies for action selection: exploratory RL learns state-action values by exploration, model-based RL uses internal models to simulate future states reached by hypothetical actions, and motor-memory RL selects past successful state-action mapping. In order to investigate the neural substrates that implement these strategies, we conducted a functional magnetic resonance imaging (fMRI) experiment while humans performed a sequential action selection task under conditions that promoted the use of a specific RL strategy. The ventromedial prefrontal cortex and ventral striatum increased activity in the exploratory condition; the dorsolateral prefrontal cortex, dorsomedial striatum, and lateral cerebellum in the model-based condition; and the supplementary motor area, putamen, and anterior cerebellum in the motor-memory condition. These findings suggest that a distinct prefrontal-basal ganglia and cerebellar network implements the model-based RL action selection strategy.
Highlights
Accumulating behavioral evidence[4,5,12,13], supported by computational models of learning and decision-making[14,15], suggests that humans may use multiple decision strategies for learning, including exploratory, model-based, and motor-memory strategies
Our working hypothesis is that humans rely on distinct action selection strategies for learning, depending on their level of experience with a task: (1) in the early stage of learning with no prior knowledge of state transition or reward setting, the exploratory strategy is used; (2) as learning continues and an internal model of action results is developed, the model-based strategy is used to expedite learning; (3) in the late stage, after many successful experiences, a motor-memory strategy is used for robust performance with minimal computational load
We examined behavioral measures to test our hypotheses on differential use of action selection strategies and analyzed fMRI signals especially during the pre-start delay period, to identify neural networks associated with behavior suggestive of the use of model-based action planning
Summary
These results suggest that the high error rate and variable performance in Condition 1 may be explained by active action exploration and learning of the new KM These results show that subject performance in terms of reward score, overall goal reach and optimal goal reach were improved quite significantly in Condition 2, with a delayed start, a finding that replicates our previous work[5]. One significant difference between previous experiments and the current study is that the grid-sailing task with pre-start delay explicitly tested that the use of internal models, explicitly represented in the KMs, for action sequence generation, while previous studies required sequential learning based on simple associations between finger movements and visual or auditory cues. Additional studies will be needed to clarify exactly what function each of these brain areas performs, and how such functions are realized by local neural circuits
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.