Abstract

Temporal-difference (TD) learning models afford the neuroscientist a theory-driven roadmap in the quest for the neural mechanisms of reinforcement learning. The application of these models to understanding the role of phasic midbrain dopaminergic responses in reward prediction learning constitutes one of the greatest success stories in behavioural and cognitive neuroscience. Critically, the classic learning paradigms associated with TD are poorly suited to cast light on its neural implementation, thus hampering progress. Here, we present a serial blocking paradigm in rodents that overcomes these limitations and allows for the simultaneous investigation of two cardinal TD tenets; namely, that learning depends on the computation of a prediction error, and that reinforcing value, whether intrinsic or acquired, propagates back to the onset of the earliest reliable predictor. The implications of this paradigm for the neural exploration of TD mechanisms are highlighted.

Highlights

  • Error-correcting algorithms as specified by associative (e.g.)[1] and temporal-difference reinforcement learning (TDRL; e.g.)[2] models have provided a useful theory-driven approach to examining how learning is implemented in the brain

  • We presented a serial blocking paradigm that is designed to explore the neural circuits underpinning TDRL

  • This paradigm is ideally suited to investigating the neural bases of TDRL’s fundamental assumptions that (1) learning will not occur in the absence of a prediction error and that (2) the value of the reinforcer propagates back to the onset of the earliest reliable predictor via the second-order conditioning effect observed in Group Control Serial

Read more

Summary

Introduction

Error-correcting algorithms as specified by associative (e.g.)[1] and temporal-difference reinforcement learning (TDRL; e.g.)[2] models have provided a useful theory-driven approach to examining how learning is implemented in the brain. The classic design has shortcomings that have limited its application to neuroscience, in the context of temporally precise neuronal recording (e.g., behavioural electrophysiology) or manipulation techniques (e.g., optogenetics) This is because in the standard design the blocking and blocked cues are presented simultaneously in compound, making it difficult to individually track neural responses to each cue as well as to dissociate the effects of neural manipulations on them. To provide a more suitable testbed for examining TDRL’s tenets and their neural underpinnings, we designed a serial blocking paradigm in which the blocking and blocked cues are serially presented during the blocking phase In this design, the blocking cue is initially trained in a trace conditioning procedure in which cue offset and reinforcer onset are separated by a trace interval (blocking cue → trace → reinforcer). These findings have important implications for the neural exploration of reinforcement learning mechanisms

Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.