Spike timing dependent plasticity implements reinforcement learning

Roberto A Santiago,Patrick D Roberts,Gerardo Lafferriere

doi:10.1186/1471-2202-8-s2-s16

Roberto A Santiago, Patrick D Roberts + Show 1 more

Open Access

https://doi.org/10.1186/1471-2202-8-s2-s16

Copy DOI

Abstract

An explanatory model is developed to show how synaptic learning mechanisms modeled through spike-timing dependent plasticity (STDP) can result in longer term adaptations consistent with reinforcement learning models. In particular, the reinforcement learning model known as temporal difference (TD) learning has been used to model neuronal behavior in the orbitofrontal cortex (OFC) and ventral tegmental area (VTA) of macaque monkey during reinforcement learning. While some research has observed, empirically, a connection between STDP and TD there is as yet no explanatory model directly connecting TD to STDP. Through analysis of the STDP rule, the connection between STDP and TD is explained. We further show that an STDP learning rule drives the spike probability of reward predicting neurons to a stable equilibrium. The equilibrium solution has an increasing slope where the steepness of the slope predicts the probability of the reward. This connection begins to shed light into more recent data gathered from VTA and OFC which are not well modeled by TD. We suggest that STDP provides the underlying mechanism for explaining reinforcement learning and other higher level perceptual and cognitive function.

Highlights

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007 William R Holmes Meeting abstracts – A single PDF containing all abstracts in this Supplement is available here http://www.biomedcentral.com/content/pdf/1471-2202-8-S2-info.pdf
An explanatory model is developed to show how synaptic learning mechanisms modeled through spike-timing dependent plasticity (STDP) can result in longer term adaptations consistent with reinforcement learning models
The reinforcement learning model known as temporal difference (TD) learning has been used to model neuronal behavior in the orbitofrontal cortex (OFC) and ventral tegmental area (VTA) of macaque monkey during reinforcement learning

Summary

Introduction

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007 William R Holmes Meeting abstracts – A single PDF containing all abstracts in this Supplement is available here http://www.biomedcentral.com/content/pdf/1471-2202-8-S2-info.pdf . Spike timing dependent plasticity implements reinforcement learning Roberto A Santiago2, Patrick D Roberts*1 and Gerardo Lafferriere2

Results

Conclusion