Abstract

Learning to predict future outcomes is critical for driving appropriate behaviors. Reinforcement learning (RL) models have successfully accounted for such learning, relying on reward prediction errors (RPEs) signaled by midbrain dopamine neurons. It has been proposed that when sensory data provide only ambiguous information about which state an animal is in, it can predict reward based on a set of probabilities assigned to hypothetical states (called the belief state). Here we examine how dopamine RPEs and subsequent learning are regulated under state uncertainty. Mice are first trained in a task with two potential states defined by different reward amounts. During testing, intermediate-sized rewards are given in rare trials. Dopamine activity is a non-monotonic function of reward size, consistent with RL models operating on belief states. Furthermore, the magnitude of dopamine responses quantitatively predicts changes in behavior. These results establish the critical role of state inference in RL.

Highlights

  • Learning to predict future outcomes is critical for driving appropriate behaviors

  • We examine how dopamine reward prediction errors (RPEs) and subsequent learning are regulated under state uncertainty, and find that both are consistent with Reinforcement learning (RL) models operating on belief states

  • We focused our analysis on trial 2 since, according to our model, that is the most likely trial to show an effect of state inference with the strongest difference from standard RL reward prediction errors (Supplementary Fig. 8a, b)

Read more

Summary

Introduction

Learning to predict future outcomes is critical for driving appropriate behaviors. Reinforcement learning (RL) models have successfully accounted for such learning, relying on reward prediction errors (RPEs) signaled by midbrain dopamine neurons. Normative theories propose that animals represent their state uncertainty as a probability distribution or belief state[7,8,9,10] providing a probabilistic estimate of the true state of the environment based on the current sensory information. Standard RL algorithms compute reward prediction on observable states, but under state uncertainty reward predictions should normatively be computed on belief states, which correspond to the probability of being in a given state. This leads to the hypothesis that dopamine activity should reflect prediction errors computed on belief states. We examine how dopamine RPEs and subsequent learning are regulated under state uncertainty, and find that both are consistent with RL models operating on belief states

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call