DFSynthesizer: Dataflow-based Synthesis of Spiking Neural Networks to Neuromorphic Hardware
Spiking Neural Networks (SNNs) are an emerging computation model that uses event-driven activation and bio-inspired learning algorithms. SNN-based machine learning programs are typically executed on tile-based neuromorphic hardware platforms, where each tile consists of a computation unit called a crossbar, which maps neurons and synapses of the program. However, synthesizing such programs on an off-the-shelf neuromorphic hardware is challenging. This is because of the inherent resource and latency limitations of the hardware, which impact both model performance, e.g., accuracy, and hardware performance, e.g., throughput. We propose DFSynthesizer, an end-to-end framework for synthesizing SNN-based machine learning programs to neuromorphic hardware. The proposed framework works in four steps. First, it analyzes a machine learning program and generates SNN workload using representative data. Second, it partitions the SNN workload and generates clusters that fit on crossbars of the target neuromorphic hardware. Third, it exploits the rich semantics of the Synchronous Dataflow Graph (SDFG) to represent a clustered SNN program, allowing for performance analysis in terms of key hardware constraints such as number of crossbars, dimension of each crossbar, buffer space on tiles, and tile communication bandwidth. Finally, it uses a novel scheduling algorithm to execute clusters on crossbars of the hardware, guaranteeing hardware performance. We evaluate DFSynthesizer with 10 commonly used machine learning programs. Our results demonstrate that DFSynthesizer provides a much tighter performance guarantee compared to current mapping approaches.
- Conference Article
35
- 10.1109/isvlsi.2019.00043
- Jul 1, 2019
Spiking neural networks (SNN) are efficient computation models to infer spacio-temporal pattern recognition applications on neuromorphic hardware. Neuromorphic hardware are typically designed using interconnected crossbars, with each crossbar containing a structure of fully connected neurons. In order to ensure application performance such as accuracy and system performance such as throughput and resource utilization, SNNs need to be efficiently mapped on neuromorphic hardware. To address this, we propose a design flow to partition and map SNN-based applications on neuromorphic hardware, with an aim to enhance application and system performance. The design flow operates in two steps : (1) a two-step clustering technique to partition trained SNNs into clusters of neurons and synapses, with an aim to minimize inter-cluster spike communication, (2) mapping and scheduling the clusters on to crossbars-based architectures, modeled using Synchronous Data-flow Graphs (SDFGs). The SDFG model incorporates hardware constraints such as I/O bandwidth of crossbars and synaptic memory while analyzing the throughput of the modeled system. Our design-flow integrates CARLsim, a GPU-accelerated application-level SNN simulator with SDF3, a tool to map SDFG on hardware. We evaluate the design-flow using synthetic and realistic SNN-based applications. We show that, for throughput constrained applications, we achieve a 21.74% and 15.03% reduction in memory usage and utilization of the time-multiplexed interconnect, compared to a state of the art approach.
- Conference Article
62
- 10.1145/3372799.3394364
- Jun 16, 2020
Machine learning applications that are implemented with spike-based computation model, e.g., Spiking Neural Network (SNN), have a great potential to lower the energy consumption when they are executed on a neuromorphic hardware. However, compiling and mapping an SNN to the hardware is challenging, especially when compute and storage resources of the hardware (viz. crossbar) need to be shared among the neurons and synapses of the SNN. We propose an approach to analyze and compile SNNs on a resource-constrained neuromorphic hardware, providing guarantee on key performance metrics such as execution time and throughput. Our approach makes the following three key contributions. First, we propose a greedy technique to partition an SNN into clusters of neurons and synapses such that each cluster can fit on to the resources of a crossbar. Second, we exploit the rich semantics and expressiveness of Synchronous Dataflow Graphs (SDFGs) to represent a clustered SNN and analyze its performance using Max-Plus Algebra, considering the available compute and storage capacities, buffer sizes, and communication bandwidth. Third, we propose a self-timed execution-based fast technique to compile and admit SNN-based applications to a neuromorphic hardware at run-time, adapting dynamically to the available resources on the hardware. We evaluate our approach with standard SNN-based applications and demonstrate a significant performance improvement compared to current practices.
- Conference Article
40
- 10.1145/3194554.3194627
- May 30, 2018
Spiking Neural Networks (SNNs) are powerful computation engines for pattern recognition and image classification applications. Apart from application performance such as recognition and classification accuracy, system performance such as throughput becomes important when executing these applications on a hardware. We propose a systematic design-flow to map SNN-based applications on a crossbar-based neuromorphic hardware, guaranteeing application as well as system performance. Synchronous Dataflow Graphs (SDFGs) are used to model these applications with extended semantics to represent neural network topologies. Self-timed scheduling is then used to analyze throughput, incorporating hardware constraints such as synaptic memory, communication and I/O bandwidth of crossbars. Our design-flow integrates CARLsim, a GPU-accelerated application-level SNN simulator with SDF3, a tool for mapping SDFG on hardware. We conducted experiments with realistic and synthetic SNNs on representative neuromorphic hardware, demonstrating throughput-resource trade-offs for a given application performance. For throughput-constrained applications, we show average 20% reduction of hardware usage with 19% reduction in energy consumption. For throughput-scalable applications, we show an average 53% higher throughput compared to a state-of-the-art approach.
- Conference Article
8
- 10.1109/aicas54282.2022.9869963
- Jun 13, 2022
Recurrent spiking neural networks (SNNs) are inspired by the working principles of biological nervous systems that offer unique temporal dynamics and event-based processing. Recently, the error backpropagation through time (BPTT) algorithm has been successfully employed to train SNNs offline, with comparable performance to artificial neural networks (ANNs) on complex tasks. However, BPTT has severe limitations for online learning scenarios of SNNs where the network is required to simultaneously process and learn from incoming data. Specifically, as BPTT separates the inference and update phases, it would require to store all neuronal states for calculating the weight updates backwards in time. To address these fundamental issues, alternative credit assignment schemes are required. Within this context, neuromorphic hardware (NMHW) implementations of SNNs can greatly benefit from in-memory computing (IMC) concepts that follow the brain-inspired collocation of memory and processing, further enhancing their energy efficiency. In this work, we utilize a biologically-inspired local and online training algorithm compatible with IMC, which approximates BPTT, e-prop, and present an approach to support both inference and training of a recurrent SNN using NMHW. To do so, we embed the SNN weights on an in-memory computing NMHW with phase-change memory (PCM) devices and integrate it into a hardware-in-the-loop training setup. We develop our approach with respect to limited precision and imperfections of the analog devices using a PCM-based simulation framework and a NMHW consisting of in-memory computing cores fabricated in 14nm CMOS technology with 256×256 PCM crossbar arrays. We demonstrate that our approach is robust even to 4-bit precision and achieves competitive performance to a floating-point 32-bit realization, while simultaneously equipping the SNN with online training capabilities and exploiting the acceleration benefits of NMHW.
- Research Article
72
- 10.1162/neco_a_01499
- May 19, 2022
- Neural Computation
Artificial neural networks (ANNs) have experienced a rapid advancement for their success in various application domains, including autonomous driving and drone vision. Researchers have been improving the performance efficiency and computational requirement of ANNs inspired by the mechanisms of the biological brain. Spiking neural networks (SNNs) provide a power-efficient and brain-inspired computing paradigm for machine learning applications. However, evaluating large-scale SNNs on classical von Neumann architectures (central processing units/graphics processing units) demands a high amount of power and time. Therefore, hardware designers have developed neuromorphic platforms to execute SNNs in and approach that combines fast processing and low power consumption. Recently, field-programmable gate arrays (FPGAs) have been considered promising candidates for implementing neuromorphic solutions due to their varied advantages, such as higher flexibility, shorter design, and excellent stability. This review aims to describe recent advances in SNNs and the neuromorphic hardware platforms (digital, analog, hybrid, and FPGA based) suitable for their implementation. We present that biological background of SNN learning, such as neuron models and information encoding techniques, followed by a categorization of SNN training. In addition, we describe state-of-the-art SNN simulators. Furthermore, we review and present FPGA-based hardware implementation of SNNs. Finally, we discuss some future directions for research in this field.
- Book Chapter
19
- 10.1007/978-3-031-19775-8_4
- Jan 1, 2022
Brain-inspired spiking neural networks (SNNs) have recently drawn more and more attention due to their event-driven and energy-efficient characteristics. The integration of storage and computation paradigm on neuromorphic hardwares makes SNNs much different from Deep Neural Networks (DNNs). In this paper, we argue that SNNs may not benefit from the weight-sharing mechanism, which can effectively reduce parameters and improve inference efficiency in DNNs, in some hardwares, and assume that an SNN with unshared convolution kernels could perform better. Motivated by this assumption, a training-inference decoupling method for SNNs named as Real Spike is proposed, which not only enjoys both unshared convolution kernels and binary spikes in inference-time but also maintains both shared convolution kernels and Real-valued Spikes during training. This decoupling mechanism of SNN is realized by a re-parameterization technique. Furthermore, based on the training-inference-decoupled idea, a series of different forms for implementing Real Spike on different levels are presented, which also enjoy shared convolutions in the inference and are friendly to both neuromorphic and non-neuromorphic hardware platforms. A theoretical proof is given to clarify that the Real Spike-based SNN network is superior to its vanilla counterpart. Experimental results show that all different Real Spike versions can consistently improve the SNN performance. Moreover, the proposed method outperforms the state-of-the-art models on both non-spiking static and neuromorphic datasets. KeywordsSpiking neural networkReal spikeBinary spikeTraining-inference-decoupledRe-parameterization
- Research Article
739
- 10.3389/fnins.2018.00774
- Oct 25, 2018
- Frontiers in Neuroscience
Spiking neural networks (SNNs) are inspired by information processing in biology, where sparse and asynchronous binary signals are communicated and processed in a massively parallel fashion. SNNs on neuromorphic hardware exhibit favorable properties such as low power consumption, fast inference, and event-driven information processing. This makes them interesting candidates for the efficient implementation of deep neural networks, the method of choice for many machine learning tasks. In this review, we address the opportunities that deep spiking networks offer and investigate in detail the challenges associated with training SNNs in a way that makes them competitive with conventional deep learning, but simultaneously allows for efficient mapping to hardware. A wide range of training methods for SNNs is presented, ranging from the conversion of conventional deep networks into SNNs, constrained training before conversion, spiking variants of backpropagation, and biologically motivated variants of STDP. The goal of our review is to define a categorization of SNN training methods, and summarize their advantages and drawbacks. We further discuss relationships between SNNs and binary networks, which are becoming popular for efficient digital hardware implementation. Neuromorphic hardware platforms have great potential to enable deep spiking networks in real-world applications. We compare the suitability of various neuromorphic systems that have been developed over the past years, and investigate potential use cases. Neuromorphic approaches and conventional machine learning should not be considered simply two solutions to the same classes of problems, instead it is possible to identify and exploit their task-specific advantages. Deep SNNs offer great opportunities to work with new types of event-based sensors, exploit temporal codes and local on-chip learning, and we have so far just scratched the surface of realizing these advantages in practical applications.
- Conference Article
17
- 10.1109/igsc51522.2020.9291228
- Oct 19, 2020
Neuromorphic computing offers one path forward for AI at the edge. However, accessing and effectively utilizing a neuromorphic hardware platform is non-trivial. In this work, we present a complete pipeline for neuromorphic computing at the edge, including a small, inexpensive, low-power, FPGA-based neuromorphic hardware platform, a training algorithm for designing spiking neural networks for neuromorphic hardware, and a software framework for connecting those components. We demonstrate this pipeline on a real-world application, engine control for a spark-ignition internal combustion engine. We illustrate how we connect engine simulations with neuromorphic hardware simulations and training software to produce hardware-compatible spiking neural networks that perform engine control to improve fuel efficiency. We present initial results on the performance of these spiking neural networks and illustrate that they outperform open-loop engine control. We also give size, weight, and power estimates for a deployed solution of this type.
- Research Article
16
- 10.1016/j.neucom.2023.126885
- Oct 7, 2023
- Neurocomputing
Spiking Neural Networks (SNN) promise extremely low-power and low-latency inference on neuromorphic hardware. Recent studies demonstrate the competitive performance of SNNs compared with Artificial Neural Networks (ANN) in conventional classification tasks.In this work, we present an energy-efficient implementation of a Reinforcement Learning (RL) algorithm using SNNs to solve an obstacle avoidance task performed by an Unmanned Aerial Vehicle (UAV), taking a Dynamic Vision Sensor (DVS) as event-based input. We train the SNN directly, improving upon state-of-art implementations based on hybrid (not directly trained) SNNs. For this purpose, we devise an adaptation of the Spatio-Temporal Backpropagation algorithm (STBP) for RL. We then compare the SNN with a state-of-art Convolutional Neural Network (CNN) designed to solve the same task. To this aim, we train both networks by exploiting a photorealistic training pipeline based on AirSim. To achieve a realistic latency and throughput assessment for embedded deployment, we designed and trained three different embedded SNN versions to be executed on state-of-art neuromorphic hardware, targeting state-of-the-art.We compared SNN and CNN in terms of obstacle avoidance performance showing that the SNN algorithm achieves better results than the CNN with a factor of 6× less energy. We also characterize the different SNN hardware implementations in terms of energy and spiking activity.
- Conference Article
4
- 10.5555/2555692.2555712
- Sep 29, 2013
Large-scale spiking neural networks (SNNs) have been used to successfully model complex neural circuits that explore various neural phenomena such as learning and memory, vision systems, auditory systems, neural oscillations, and many other important topics of neural function. Additionally, SNNs are particularly well-adapted to run on neuromorphic hardware as spiking events are often sparse, leading to a potentially large reduction in both bandwidth requirements and power usage. The inclusion of realistic plasticity equations, neural dynamics, and recurrent topologies has increased the descriptive power of SNNs but has also made the task of tuning these biologically realistic SNNs difficult. We present an automated parameter-tuning framework capable of tuning large-scale SNNs quickly and efficiently using evolutionary algorithms (EA) and off-the-shelf graphics processing units (GPUs).To test the feasibility of an automated parameter-tuning framework, our group used EAs to tune open parameters in SNNs running concurrently on a GPU. The SNNs were evolved to produce orientation-dependent stimulus responses similar to those found in simple cells of the primary visual cortex (V1) through the formation of self-organizing receptive fields (SORFs). The general evolutionary approach was as follows: A population of neural networks was created, each with a unique set of neural parameter values that defined overall behavior. Each SNN was then ranked based on a fitness value assigned by an objective function in which higher fitness values were given to SNNs that (a) reproduced responses observed in primate visual cortex, and (b) spanned the stimulus space, and (c) had sparse firing rates. The highest ranked individuals were selected, recombined, and mutated to form the offspring for the next generation. This process continued until a desired fitness was reached or until other termination conditions were met (Figure 1a).The automated parameter-tuning framework consisted of three software packages. The framework included: (a) the CARLsim SNN simulator [1], (2) the Evolving Objects (EO) computational framework [2], and (3) a parameter-tuning interface (PTI), developed by our group, to provide an interface between CARLsim and EO (See Figure 1b). The EO computational framework ran the evolutionary algorithm on the user-designated parameters of SNNs in CARLsim. The PTI allowed the objective function to be calculated independent of the EO computation framework. Parameter values were passed from the EO computation framework through the PTI to the SNN in CARLsim where the objective function is calculated. After the objective function was executed, the results were passed from the SNN in CARLsim through the PTI back to the EO computation framework for processing by the EA. With this approach, the fitness function calculation, which involved running each SNN in the population, could be run in parallel on the GPU while the remainder of EA calculations can be performed using the CPU (Figure 1b).A sample SNN with 4,104 neurons was tuned to respond with V1 simple cell-like tuning curves and produce SORFs. A performance analysis comparing the GPU-accelerated implementation to a single-threaded CPU implementation was carried out and showed that the GPU implementation could achieve a 65 times speedup over the CPU implementation. Additionally, the parameter value solutions found in the tuned SNN were stable and robust.The automated parameter-tuning framework presented here will be of use to both the computational neuroscience and neuromorphic engineering communities, making the process of constructing and tuning large-scale SNNs much quicker and easier.
- Research Article
24
- 10.1016/j.neucom.2021.06.070
- Jun 24, 2021
- Neurocomputing
Direct training of hardware-friendly weight binarized spiking neural network with surrogate gradient learning towards spatio-temporal event-based dynamic data recognition
- Research Article
27
- 10.1016/j.knosys.2022.110193
- Dec 14, 2022
- Knowledge-Based Systems
Bio-inspired Active Learning method in spiking neural network
- Research Article
162
- 10.1109/tvlsi.2019.2951493
- Dec 5, 2019
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Neuromorphic hardware implements biological neurons and synapses to execute a spiking neural network (SNN)-based machine learning. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition an SNN into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and intercluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion and improves application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a metaheuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on a state-of-the-art neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and spike latency by 21%, compared to the best-performing SNN mapping technique.
- Research Article
7
- 10.3389/fninf.2022.883360
- May 30, 2022
- Frontiers in Neuroinformatics
Neuromorphic hardware is based on emulating the natural biological structure of the brain. Since its computational model is similar to standard neural models, it could serve as a computational accelerator for research projects in the field of neuroscience and artificial intelligence, including biomedical applications. However, in order to exploit this new generation of computer chips, we ought to perform rigorous simulation and consequent validation of neuromorphic models against their conventional implementations. In this work, we lay out the numeric groundwork to enable a comparison between neuromorphic and conventional platforms. “Loihi”—Intel's fifth generation neuromorphic chip, which is based on the idea of Spiking Neural Networks (SNNs) emulating the activity of neurons in the brain, serves as our neuromorphic platform. The work here focuses on Leaky Integrate and Fire (LIF) models based on neurons in the mouse primary visual cortex and matched to a rich data set of anatomical, physiological and behavioral constraints. Simulations on classical hardware serve as the validation platform for the neuromorphic implementation. We find that Loihi replicates classical simulations very efficiently with high precision. As a by-product, we also investigate Loihi's potential in terms of scalability and performance and find that it scales notably well in terms of run-time performance as the simulated networks become larger.
- Research Article
7
- 10.3389/fnins.2022.883360
- May 30, 2022
- Frontiers in neuroscience
Neuromorphic hardware is based on emulating the natural biological structure of the brain. Since its computational model is similar to standard neural models, it could serve as a computational accelerator for research projects in the field of neuroscience and artificial intelligence, including biomedical applications. However, in order to exploit this new generation of computer chips, we ought to perform rigorous simulation and consequent validation of neuromorphic models against their conventional implementations. In this work, we lay out the numeric groundwork to enable a comparison between neuromorphic and conventional platforms. “Loihi”—Intel's fifth generation neuromorphic chip, which is based on the idea of Spiking Neural Networks (SNNs) emulating the activity of neurons in the brain, serves as our neuromorphic platform. The work here focuses on Leaky Integrate and Fire (LIF) models based on neurons in the mouse primary visual cortex and matched to a rich data set of anatomical, physiological and behavioral constraints. Simulations on classical hardware serve as the validation platform for the neuromorphic implementation. We find that Loihi replicates classical simulations very efficiently with high precision. As a by-product, we also investigate Loihi's potential in terms of scalability and performance and find that it scales notably well in terms of run-time performance as the simulated networks become larger.