Previous studies of reinforcement learning (RL) have established that choice outcomes are encoded in a context-dependent fashion. Several computational models have been proposed to explain context-dependent encoding, including reference point centering and range adaptation models. The former assumes that outcomes are centered around a running estimate of the average reward in each choice context, while the latter assumes that outcomes are compared to the minimum reward and then scaled by an estimate of the range of outcomes in each choice context. However, there are other computational mechanisms that can explain context dependence in RL. In the present study, a frequency encoding model is introduced that assumes outcomes are evaluated based on their proportional rank within a sample of recently experienced outcomes from the local context. A range-frequency model is also considered that combines the range adaptation and frequency encoding mechanisms. We conducted two fully incentivized behavioral experiments using choice tasks for which the candidate models make divergent predictions. The results were most consistent with models that incorporate frequency or rank-based encoding. The findings from these experiments deepen our understanding of the underlying computational processes mediating context-dependent outcome encoding in human RL.
Read full abstract