Abstract

The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) which corresponds to the error signal in Temporal Difference (TD) learning algorithms. This hypothesis has been reinforced by numerous studies showing the relevance of TD learning algorithms to describe the role of basal ganglia in classical conditioning. However, recent recordings of DA neurons during multi-choice tasks raised contradictory interpretations on whether DA’s RPE signal is action dependent or not. Thus the precise TD algorithm (i.e. Actor-Critic, Q-learning or SARSA) that best describes DA signals remains unknown. Here we simulate and precisely analyze these TD algorithms on a multi-choice task performed by rats. We find that DA activity previously reported in this task is best fitted by a TD error which has not fully converged, and which converged faster than observed behavioral adaptation.

Highlights

  • The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) [1] which corresponds to the error signal in Temporal Difference (TD) learning algorithms [2]

  • While the first study suggests that DA neurons encode a RPE compatible with SARSA, results from the second study are interpreted as more consistent with Q-learning [4]

  • These studies only proposed a qualitative comparison of the ability of these TD learning algorithms to explain these patterns of activity

Read more

Summary

Introduction

The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) [1] which corresponds to the error signal in Temporal Difference (TD) learning algorithms [2]. Recent recordings of DA neurons during multi-choice tasks investigated this issue and raised contradictory interpretations on whether DA’s RPE signal is action dependent [3] or not [4]. While the first study suggests that DA neurons encode a RPE compatible with SARSA, results from the second study are interpreted as more consistent with Q-learning [4].

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call