Direct reinforcement learning, spike time dependent plasticity and the BCM rule

Dorit Barash,Ron Meir

doi:10.1186/1471-2202-8-s2-p197

Dorit Barash, Ron Meir

Open Access

https://doi.org/10.1186/1471-2202-8-s2-p197

Copy DOI

Abstract

Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We model a network of spiking neurons as a Partially Observed Markov Decision Process (POMDP) and apply a recently introduced policy learning algorithm from Machine Learning to the network [1]. Based on computing a stochastic gradient approximation of the average reward, we derive a plasticity rule falling in the class of Spike Time Dependent Plasticity (STDP) rules, which ensures convergence to a local maximum of the average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. The obtained update rule is based on the correlation between the reward signal and local data available at the synaptic site. This data depends on local activity (e.g., pre and post synaptic spikes) and requires mechanisms that are available at the cellular level. Simulations on several toy problems demonstrate the utility of the approach. Like most stochastic gradient based methods, the convergence rate is slow, even though the percentage of convergence to global maxima is high. Additionally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule [2], for which good biological evidence exists. The relation to the BCM rule captures the nature of the relation between pre and post synaptic spiking rates, and in particular the self-regularizing nature of the BCM rule. Compared to previous work in this field, our model is more realistic than the one used in [3], and the derivation of the update rule applies to a broad class of voltage based neuronal models, eliminating some of the additional statistical assumptions required in [4]. Finally, the connection between Reinforcement Learning and the BCM rule is, to the best of our knowledge, new.

Highlights

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007 William R Holmes Meeting abstracts – A single PDF containing all abstracts in this Supplement is available here http://www.biomedcentral.com/content/pdf/1471-2202-8-S2-info.pdf
We model a network of spiking neurons as a Partially Observed Markov Decision Process (POMDP) and apply a recently introduced policy learning algorithm from Machine Learning to the network [1]
Based on computing a stochastic gradient approximation of the average reward, we derive a plasticity rule falling in the class of Spike Time Dependent Plasticity (STDP) rules, which ensures convergence to a local maximum of the average reward

Summary

Introduction

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007 William R Holmes Meeting abstracts – A single PDF containing all abstracts in this Supplement is available here http://www.biomedcentral.com/content/pdf/1471-2202-8-S2-info.pdf . Address: 1IBM Haifa Research Lab, Mount Carmel, Haifa 31905, Israel and 2Department of Electrical Engineering, Technion, Haifa 32000, Israel Email: Ron Meir* - rmeir@ee.technion.ac.il * Corresponding author from Sixteenth Annual Computational Neuroscience Meeting: CNS*2007 Toronto, Canada.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Neuroscience	Publication Date: Jul 1, 2007
Citations: 3	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Direct reinforcement learning, spike time dependent plasticity and the BCM rule

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Neuroscience

Lead the way for us

Similar Papers

Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule
Dorit Baras ... Ron Meir
Neural Computation | VOL. 19
Dorit Baras, et. al.Dorit Baras ... Ron Meir
01 Aug 2007
Neural Computation | VOL. 19

Author response: Evolving interpretable plasticity for spiking networks
Jakob Jordan ... Maximilian Schmidt
-
Jakob Jordan, et. al.Jakob Jordan ... Maximilian Schmidt
15 Jul 2021
15 Jul 2021

A Bayesian game based adaptive fuzzy controller for multiagent POMDPs
Rajneesh Sharma ... Matthijs T J Spaan
-
Rajneesh Sharma, et. al.Rajneesh Sharma ... Matthijs T J Spaan
01 Jul 2010
01 Jul 2010

Burst-Dependent Bidirectional Plasticity in the Cerebellum Is Driven by Presynaptic NMDA Receptors.
Guy Bouvier ... Nicolas Brunel
Cell Reports | VOL. 15
Guy Bouvier, et. al.Guy Bouvier ... Nicolas Brunel
24 Mar 2016
Cell Reports | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Direct reinforcement learning, spike time dependent plasticity and the BCM rule

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Neuroscience