Abstract
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs. model-based mechanisms). We show that the eligibility trace decays not with sheer time, but rather with the number of discrete decision steps made by the participants. We further show that, unexpectedly, neither monetary rewards nor the environment's spatial regularity significantly modulate behavioral performance. Finally, we found that model-free learning algorithms describe human performance better than model-based algorithms.
Highlights
Everyday actions are usually not recompensed by immediate reward
This result is surprising since a higher exploration rate in a more remembered condition should lead to better performance, when map formation is involved, this seems not to be the case, suggesting that subjects forget some of the states they explore
We found no other effects of ISI on any of the remaining Sarsa(λ), Dyna-Q, or the Exploration/Exploitation parameter fits (Tables A.7, A.8, A.9, and A.10 in Supplementary Material), indicating that the same parameter setting in each model well-described subject performance in each condition
Summary
Everyday actions are usually not recompensed by immediate reward. You start by adding ingredient after ingredient to the dough, but, you will not know whether you added too much or too little yeast until your cake is out of the oven. In this case, the feedback is delayed, and sparse, making it difficult to infer each action’s outcome. The feedback is delayed, and sparse, making it difficult to infer each action’s outcome These situations are usually referred to as sequential decision-making
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have