Reinforcement learning and the reward positivity with aversive outcomes.

Elizabeth A Bauer,Brandon K Watanabe,Annmarie Macnamara

doi:10.1111/psyp.14460

Abstract

The reinforcement learning (RL) theory of the reward positivity (RewP), an event-related potential (ERP) component that measures reward responsivity, suggests that the RewP should be largest when positive outcomes are unexpected and has been supported by work using appetitive outcomes (e.g., money). However, the RewP can also be elicited by the absence of aversive outcomes (e.g., shock). The limited work to-date that has manipulated expectancy while using aversive outcomes has not supported the predictions of RL theory. Nonetheless, this work has been difficult to reconcile with the appetitive literature because the RewP was not observed as a reward signal in these studies, which used passive tasks that did not involve participant choice. Here, we tested the predictions of the RL theory by manipulating expectancy in an active/choice-based threat-of-shock doors task that was previously found to elicit the RewP as a reward signal. Moreover, we used principal components analysis to isolate the RewP from overlapping ERP components. Eighty participants viewed pairs of doors surrounded by a red or green border; shock delivery was expected (80%) following red-bordered doors and unexpected (20%) following green-bordered doors. The RewP was observed as a reward signal (i.e., no shock > shock) that was not potentiated for unexpected feedback. In addition, the RewP was larger overall for unexpected (vs expected) feedback. Therefore, the RewP appears to reflect the additive (not interactive) effects of reward and expectancy, challenging the RL theory of the RewP, at least when reward is defined as the absence of an aversive outcome.

Full Text