Abstract
In a complex and uncertain world, how do we select appropriate behavior? One possibility is that we choose actions that are highly reinforced by their probabilistic consequences (model-free processing). However, we may instead plan actions prior to their actual execution by predicting their consequences (model-based processing). It has been suggested that the brain contains multiple yet distinct systems involved in reward prediction. Several studies have tried to allocate model-free and model-based systems to the striatum and the lateral prefrontal cortex (LPFC), respectively. Although there is much support for this hypothesis, recent research has revealed discrepancies. To understand the nature of the reward prediction systems in the LPFC and the striatum, a series of single-unit recording experiments were conducted. LPFC neurons were found to infer the reward associated with the stimuli even when the monkeys had not yet learned the stimulus-reward (SR) associations directly. Striatal neurons seemed to predict the reward for each stimulus only after directly experiencing the SR contingency. However, the one exception was “Exclusive Or” situations in which striatal neurons could predict the reward without direct experience. Previous single-unit studies in monkeys have reported that neurons in the LPFC encode category information, and represent reward information specific to a group of stimuli. Here, as an extension of these, we review recent evidence that a group of LPFC neurons can predict reward specific to a category of visual stimuli defined by relevant behavioral responses. We suggest that the functional difference in reward prediction between the LPFC and the striatum is that while LPFC neurons can utilize abstract code, striatal neurons can code individual associations between stimuli and reward but cannot utilize abstract code.
Highlights
Reward prediction is paramount for learning behavior (Sutton and Barto, 1998; Schultz, 2006) and for decision-making processes in the brain (Rangel et al, 2008)
By considering the neuronal activity recorded from the lateral prefrontal cortex (LPFC) and the striatum in the sequential paired-association task with the asymmetric reward schedule, we can verify whether the neurons in these areas use the model-based learning process or the model-free learning process
Some studies have shown that the activity of dopamine neurons could be modulated by memorized experiences of state transitions and SR associations to represent reward prediction error (RPE) and play a role in reward learning in the striatum (Nakahara et al, 2004; Bromberg-Martin et al, 2010; Enomoto et al, 2011)
Summary
Reward prediction is paramount for learning behavior (Sutton and Barto, 1998; Schultz, 2006) and for decision-making processes in the brain (Rangel et al, 2008). Much research has shown that many brain areas are involved in reward prediction (Yamada et al, 2004; Knutson and Cooper, 2005; Padoa-Schioppa and Assad, 2006; Paton et al, 2006; Behrens et al, 2007, 2008; Hare et al, 2008; Hayden et al, 2008; Hikosaka et al, 2008; Haber and Knutson, 2010; Rushworth et al, 2011; Levy and Glimcher, 2012; Garrison et al, 2013). The basal ganglia and multiple sub-areas in the prefrontal cortex especially play important but different roles in the reward prediction process
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have