Abstract
Spiking neural networks (SNNs) offer many advantages over traditional artificial neural networks (ANNs) such as biological plausibility, fast information processing, and energy efficiency. Although SNNs have been used to solve a variety of control tasks using the Spike-Timing-Dependent Plasticity (STDP) learning rule, existing solutions usually involve hard-coded network architectures solving specific tasks rather than solving different kinds of tasks generally. This results in neglecting one of the biggest advantages of ANNs, i.e., being general-purpose and easy-to-use due to their simple network architecture, which usually consists of an input layer, one or multiple hidden layers and an output layer. This paper addresses the problem by introducing an end-to-end learning approach of spiking neural networks constructed with one hidden layer and reward-modulated Spike-Timing-Dependent Plasticity (R-STDP) synapses in an all-to-all fashion. We use the supervised reward-modulated Spike-Timing-Dependent-Plasticity learning rule to train two different SNN-based sub-controllers to replicate a desired obstacle avoiding and goal approaching behavior, provided by pre-generated datasets. Together they make up a target-reaching controller, which is used to control a simulated mobile robot to reach a target area while avoiding obstacles in its path. We demonstrate the performance and effectiveness of our trained SNNs to achieve target reaching tasks in different unknown scenarios.
Highlights
We demonstrate the capabilities and efficiencies of our proposed end-to-end learning of spiking neural network based on supervised R-Spike-Timing-Dependent Plasticity (STDP) by performing a target-reaching vehicle
Teaching a brain-inspired spiking neural network in a general and easy way is not simple. We tackled this problem by proposing an end-to-end learning rule based on the supervised reward-modulated Spike-Timing-Dependent Plasticity (R-STDP) rule and used it for training two spiking neural networks (SNNs) for an autonomous targettracking implementation
By changing the inputs fed into the network and slightly changing the way that the reward was assigned to the output neurons, two SNNs were trained to learn to exhibit the desired behavior successfully and the robot was able to reach a previously set target area while avoiding obstacles
Summary
Despite the success of traditional artificial neural networks (ANNs) in learning complex non-linear functions, the interest of spiking neural networks (SNNs) is steadily increasing due to the fact that SNNs offer many fundamental and inherent advantages over traditional ANNs, such as biological plausibility (Maass, 1997), rapid information processing (Thorpe et al, 2001; Wysoski et al, 2010), and energy efficiency (Drubach, 2000; Cassidy et al, 2014). Mobile robots will be able to manage their weaknesses of carrying limited computing resources and power supply based on SNN-based controllers Training these kinds of networks is notoriously difficult, since the error back-propagation mechanism commonly used in conventional neural networks cannot be directly transferred to SNNs due to the non-differentiability at spike times. Neuroscience studies reveal that the brain modifies the outcome of STDP synapses using one or more chemicals emitted by given neurons This mechanism inspires a new method for training SNNs and is known as reward-modulated spike-timing-dependent plasticity (R-STDP) (Izhikevich, 2007). The SNN with transferred policy can in return control the robot in an energy-efficient way, which can be achieved by running on a neuromorphic hardware (Blum et al, 2017) To this end, our paper looks to explore an indirect SNN training approach based on the R-STDP learning rule and supervised learning framework. All of our codes and experiment demos can be found in the Supplementary Files
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have