Abstract

Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs. model-based mechanisms). We show that the eligibility trace decays not with sheer time, but rather with the number of discrete decision steps made by the participants. We further show that, unexpectedly, neither monetary rewards nor the environment's spatial regularity significantly modulate behavioral performance. Finally, we found that model-free learning algorithms describe human performance better than model-based algorithms.

Highlights

  • Everyday actions are usually not recompensed by immediate reward

  • This result is surprising since a higher exploration rate in a more remembered condition should lead to better performance, when map formation is involved, this seems not to be the case, suggesting that subjects forget some of the states they explore

  • We found no other effects of ISI on any of the remaining Sarsa(λ), Dyna-Q, or the Exploration/Exploitation parameter fits (Tables A.7, A.8, A.9, and A.10 in Supplementary Material), indicating that the same parameter setting in each model well-described subject performance in each condition

Read more

Summary

Introduction

Everyday actions are usually not recompensed by immediate reward. You start by adding ingredient after ingredient to the dough, but, you will not know whether you added too much or too little yeast until your cake is out of the oven. In this case, the feedback is delayed, and sparse, making it difficult to infer each action’s outcome. The feedback is delayed, and sparse, making it difficult to infer each action’s outcome These situations are usually referred to as sequential decision-making

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call