Abstract

Reinforcement learning models have been extensively studied for decision-making tasks with reward feedback. However, in designing an experiment to collect data for Q-learning models, the quantitative effect of a presented stimulus on the estimation precision of participant parameters has generally not been considered. That is, the lack of a mathematical framework has prevented researchers from designing an optimal experiment. To tackle this problem, this study analytically derives the Fisher information. Furthermore, this study formulates a stochastic representation of the Q-learning model, which is one of the most commonly applied reinforcement learning models. With this derivation, a two-step procedure is proposed to select the optimal stimuli in terms of estimation precision, in which low-cost Fisher information evaluation and more detailed finite-sample Monte Carlo simulation are combined. The simulation studies show that reward probability reversal leads to a high estimation precision for the learning rate parameter. By contrast, for the inverse temperature parameter, a larger difference in reward probability between options leads to higher estimation precision. These results reveal that the optimal experimental design is dependent on which trait parameters of the Q-learning model are of interest to researchers. Further, it is found that the use of undesirable stimuli in terms of trait parameter precision leads to a large bias in the correlation coefficient estimate. Based on the results, the approaches to designing experiments in the Q-learning model are discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call