Robust valence-induced biases on motor response and confidence in human reinforcement learning

Chih-Chung Ting,Maël Lebreton,Stefano Palminteri,Jan B Engelmann

doi:10.3758/s13415-020-00826-0

Abstract

In simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower. Second, loss contexts decrease individuals’ confidence in their choices. Whether these two effects are two manifestations of a single mechanism or whether they can be partially dissociated is unknown. Across six experiments, we attempted to disrupt the valence-induced motor bias effects by manipulating the mapping between decisions and actions and imposing constraints on response times (RTs). Our goal was to assess the presence of the valence-induced confidence bias in the absence of the RT bias. We observed both motor and confidence biases despite our disruption attempts, establishing that the effects of valence on motor and metacognitive responses are very robust and replicable. Nonetheless, within- and between-individual inferences reveal that the confidence bias resists the disruption of the RT bias. Therefore, although concomitant in most cases, valence-induced motor and confidence biases seem to be partly dissociable. These results highlight new important mechanistic constraints that should be incorporated in learning models to jointly explain choice, reaction times and confidence.

Highlights

In the reinforcement learning context, reward-seeking and punishment-avoidance present an intrinsic and fundamental informational asymmetry
We focused on two research questions: first, are valence-induced motor and confidence biases robust and replicable? Second, can the confidence bias be observed in the absence of the motor bias? Regarding the second question, previous research has yielded conflicting results that generated two opposing predictions
The correlation between confidence and response times (RTs) was modulated by our experimental manipulations (effect of experiment: F(5, 102) = 9.91, P < 0.001, η2 = 0.32); post-hoc tests revealed that it was significantly altered by all our experimental manipulations in Exp. 3-6 (Figure S. 2)

Summary

Introduction

In the reinforcement learning context, reward-seeking and punishment-avoidance present an intrinsic and fundamental informational asymmetry. In the former situation, accurate choice (i.e., reward maximization) increases the frequency of the reinforcer (the reward). Humans learn to seek reward and to avoid punishment -well (Fontanesi et al, 2019; Guitart-Masip et al, 2012; Palminteri et al, 2015) This is robustly demonstrated in experimental data, and nicely explained by context-dependent reinforcement-learning models (Fontanesi et al, 2019; Palminteri et al, 2015), which can be seen as formal computational instantiations of

Objectives

Methods

Results

Conclusion