Abstract

Event Abstract Back to Event Multiple timescales of reward memory in lateral habenula and midbrain dopamine neurons How do we learn to predict rewards? One approach, embodied by current computational models of reinforcement learning, is to learn a “value function” which maps each state of the world to its predicted future yield of rewards. However, there is evidence [1,2] that the brain contains a multitude of neural elements that learn about the world at different rates, some learning quickly and others learning slowly. If this was true of the reward system, then reward-predicting neurons would not be bound to a single value function, but instead could choose between multiple estimates of reward value, each computed by integrating past evidence over a different timescale. Here we present direct evidence for this proposal. We trained monkeys to perform a reward-biased saccade task in which the reward value of each trial was predictable based on the past reward history [3,4]. We then analyzed single-neuron data [4] recorded from a major source of reward predictive signals, midbrain dopamine neurons, and one of their main input structures, the lateral habenula. Dopamine neurons behaved consistently with their role in signaling "reward prediction errors", carrying signals related to the trial's predicted value in response to two task events: (1) a cue indicating the start of the trial, and (2) a cue indicating the trial’s reward outcome. However, these two neural judgments of the trial's value were based on very different memories for the past. When the trial began, dopamine neuron activity was influenced by only a single previous reward outcome. But when the trial’s outcome was revealed, dopamine neurons suddenly improved their memory to reflect at least three previous reward outcomes, close to the timescale of memory seen in animal behavior. Lateral habenula neurons showed the same pattern, a short-timescale memory when the trial began and a long-timescale memory when the outcome was revealed. In addition, many habenula neurons encoded the reward history in their level of tonic activity. This allowed us to see that the timescale of memory developed gradually throughout the trial, lengthening in anticipation of the reward-predictive cue, and then fading back to a one-trial memory during the inter-trial interval. These findings suggest that reward-predicting neurons can switch between multiple timescales of reward memory to suit the computational demands of the task at hand, making predictions based on a short-timescale memory during lulls in the task, but switching to a more accurate long-timescale memory when new reward information is imminent and the need for prediction is greatest.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call