Abstract

Humans are capable of correcting their actions based on actions performed in the past, and this ability enables them to adapt to a changing environment. The computational field of reinforcement learning (RL) has provided a powerful explanation for understanding such processes. Recently, the dual learning system, modeled as a hybrid model that incorporates value update based on reward-prediction error and learning rate modulation based on the surprise signal, has gained attention as a model for explaining various neural signals. However, the functional significance of the hybrid model has not been established. In the present study, we used computer simulation in a reversal learning task to address functional significance in a probabilistic reversal learning task. The hybrid model was found to perform better than the standard RL model in a large parameter setting. These results suggest that the hybrid model is more robust against the mistuning of parameters compared with the standard RL model when decision-makers continue to learn stimulus-reward contingencies, which can create abrupt changes. The parameter fitting results also indicated that the hybrid model fit better than the standard RL model for more than 50% of the participants, which suggests that the hybrid model has more explanatory power for the behavioral data than the standard RL model.

Highlights

  • One of the fundamental questions in a study on decision making is how animals and humans select actions in the face of reward and punishment and how they continually update their behavioral strategies according to changes in the environment

  • Task description The subjects participated in a simple probabilistic reversal learning task, in which the task structure was identical to the one www.frontiersin.org that we had used for the computer simulation

  • In the present study, we investigated the functional significance of the hybrid learning algorithm using computer simulations with a reversal learning task

Read more

Summary

Introduction

One of the fundamental questions in a study on decision making is how animals and humans select actions in the face of reward and punishment and how they continually update their behavioral strategies according to changes in the environment. The value functions are continually updated based on the reward prediction error, which is the mismatch between the expected and actual rewards Based on this concept, the prediction error quantified in the theories of conditioning, such as the Rescorla-Wagner (RW) model (Rescorla and Wagner, 1972), has helped to explain the choice behavior of humans and animals (Daw and Doya, 2006; Corrado and Doya, 2007). The RW model was originally proposed as a classical conditioning model in which the animal is assumed to learn the associative strength between cues and outcome from the prediction errors. This model uses the prediction errors to drive the change of associative strength. In the RL literature, the update rule of associative strength has been used to update the value function, which is used for action selection

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call