Abstract

Reinforcement learning describes the process by which during a series of trial-and-error attempts, actions that culminate in reward are reinforced, becoming more likely to be chosen in similar circumstances. When decisions are based on sensory stimuli, an association is formed between the stimulus, the action and the reward. Computational, behavioral and neurobiological accounts of this process successfully explain simple learning of stimuli that differ in one aspect, or along a single stimulus dimension. However, when stimuli may vary across several dimensions, identifying which features are relevant for the reward is not trivial, and the underlying cognitive process is poorly understood. To study this we adapted an intra-dimensional/ extra-dimensional set-shifting paradigm to train rats on a multi-sensory discrimination task. In our setup, stimuli of different modalities (spatial, olfactory and visual) are combined into complex cues and manipulated independently. In each set, only a single stimulus dimension is relevant for reward. To distinguish between learning and decision-making we suggest a weighted attention model (WAM). Our model learns by assigning a separate learning rule for the values of features of each dimension (e.g., for each color), reinforced after every experience. Decisions are made by comparing weighted averages of the learnt values, factored by dimension specific weights. Based on the observed behavior of the rats we estimated the parameters of the WAM and demonstrated that it outperforms an alternative model, in which a learnt value is assigned to each combination of features. Estimated decision weights of the WAM reveal an experience-based bias in learning. In the first experimental set the weights associated with all dimensions were similar. The extra-dimensional shift rendered this dimension irrelevant. However, its decision weight remained high for the early learning stage in this last set, providing an explanation for the poor performance of the animals. Thus, estimated weights can be viewed as a possible way to quantify the experience-based bias.

Highlights

  • A common conceptualization of choices between alternatives highlights two components to the decision process: outcome valuation and mapping of states and actions to outcomes (Rangel et al, 2008; Gilboa and Marinacci, 2016)

  • Only a single stimulus dimension is relevant for reward

  • The pair of odors and the pair of colored light emitting diodes (LEDs) were randomly assigned to the two arms

Read more

Summary

Introduction

A common conceptualization of choices between alternatives highlights two components to the decision process: outcome valuation and mapping of states and actions to outcomes (Rangel et al, 2008; Gilboa and Marinacci, 2016). Modeling approaches of the learning process in decision making contexts have differed between fields of research Whereas in economics it has been predominantly based on the Bayesian framework (Gilboa and Marinacci, 2016), a frequentist-based reinforcement learning (RL) (Sutton and Barto, 1998) approach was instrumental in explaining human and animal choice behavior and in pointing to the function of the underlying neurobiological structures. Classical RL models (Wagner and Rescorla, 1972; Sutton and Barto, 1998) assume that a decision-maker assigns a value to each action (or action-state pair) and updates it according to the reward history. Learning in such scenarios involves two stages: structure learning and parametric learning (Braun et al, 2010)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call