Abstract
SpiNNaker is a digital neuromorphic architecture, designed specifically for the low power simulation of large-scale spiking neural networks at speeds close to biological real-time. Unlike other neuromorphic systems, SpiNNaker allows users to develop their own neuron and synapse models as well as specify arbitrary connectivity. As a result SpiNNaker has proved to be a powerful tool for studying different neuron models as well as synaptic plasticity—believed to be one of the main mechanisms behind learning and memory in the brain. A number of Spike-Timing-Dependent-Plasticity(STDP) rules have already been implemented on SpiNNaker and have been shown to be capable of solving various learning tasks in real-time. However, while STDP is an important biological theory of learning, it is a form of Hebbian or unsupervised learning and therefore does not explain behaviors that depend on feedback from the environment. Instead, learning rules based on neuromodulated STDP (three-factor learning rules) have been shown to be capable of solving reinforcement learning tasks in a biologically plausible manner. In this paper we demonstrate for the first time how a model of three-factor STDP, with the third-factor representing spikes from dopaminergic neurons, can be implemented on the SpiNNaker neuromorphic system. Using this learning rule we first show how reward and punishment signals can be delivered to a single synapse before going on to demonstrate it in a larger network which solves the credit assignment problem in a Pavlovian conditioning experiment. Because of its extra complexity, we find that our three-factor learning rule requires approximately 2× as much processing time as the existing SpiNNaker STDP learning rules. However, we show that it is still possible to run our Pavlovian conditioning model with up to 1 × 104 neurons in real-time, opening up new research opportunities for modeling behavioral learning on SpiNNaker.
Highlights
One of the earliest and most famous hypotheses on when synaptic plasticity occurs came from Donald Hebb, who postulated (Hebb, 1949):“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”In the context of the changing strengths of existing synaptic connections, rather than the formation of new synapses, Hebb’s postulate suggests that connections between neurons whose activity is causally related will be strengthened
Reinforcement learning is a biologically inspired learning paradigm where an agent learns by interacting with the world around it and modifies its behavior based on sparse feedback
Reinforcement learning has been shown to work effectively in convolutional neural networks (Mnih et al, 2015). It remains unclear how these techniques could be replicated in spiking neural networks and whether classical reinforcement learning (Sutton and Barto, 1998) is at all analogous to dopamine modulated synaptic plasticity in the brain (Reynolds et al, 2001; Pawlak and Kerr, 2008)
Summary
In the context of the changing strengths of existing synaptic connections, rather than the formation of new synapses, Hebb’s postulate suggests that connections between neurons whose activity is causally related will be strengthened. Using improved experimental techniques that became available in the 1990s, Markram et al (1997) showed that the magnitude of the changes in synaptic strength caused by Hebbian learning were related to the timing of pre- and post-synaptic spikes. The relationship between the magnitude of these changes and the relative timing of the pre- and post-synaptic spikes became known as Spike-Timing Dependent Plasticity (STDP) and the data recorded by Bi and Poo (1998) suggests that it reinforces causality between the firing of the pre- and postsynaptic neurons. When a pre-synaptic spike arrives before a post-synaptic spike is emitted the synapse is potentiated (strengthened). In the ensuing years STDP has been widely used to solve many tasks using biologically plausible spiking neural networks (Gerstner et al, 1996; Song et al, 2000; Davison and Frégnac, 2006)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have