A Programmable Event-driven Architecture for Evaluating Spiking Neural Networks

Arnab Roy,Kamakoti Veezhinathan,Sanchari Sen,Neel Gala,Swagath Venkataramani,Anand Raghunathan

doi:10.1109/islped.2017.8009176

Abstract

Spiking neural networks (SNNs) represent the third generation of neural networks and are expected to enable new classes of machine learning applications. However, evaluating large-scale SNNs (e.g., of the scale of the visual cortex) on power-constrained systems requires significant improvements in computing efficiency. A unique attribute of SNNs is their event-driven nature-information is encoded as a series of spikes, and work is dynamically generated as spikes propagate through the network. Therefore, parallel implementations of SNNs on multi-cores and GPGPUs are severely limited by communication and synchronization overheads. Recent years have seen great interest in deep learning accelerators for non-spiking neural networks, however, these architectures are not well suited to the dynamic, irregular parallelism in SNNs. Prior efforts on specialized SNN hardware utilize spatial architectures, wherein each neuron is allocated a dedicated processing element, and large networks are realized by connecting multiple chips into a system. While suitable for large-scale systems, this approach is not a good match to size or cost constrained mobile devices. We propose PEASE, a Programmable Event-driven processor Architecture for SNN Evaluation. PEASE comprises of Spike Processing Units (SPUs) that are dynamically scheduled to execute computations triggered by a spike. Instructions to the SPUs are dynamically generated by Spike Schedulers (SSs) that utilize event queues to track unprocessed spikes and identify neurons that need to be evaluated. The memory hierarchy in PEASE is fully software managed, and the processing elements are interconnected using a two-tiered bus-ring topology matching the communication characteristics of SNNs. We propose a method to map any given SNN to PEASE such that the workload is balanced across SPUs and SPU clusters, while pipelining across layers of the network to improve performance. We implemented PEASE at the RTL level and synthesized it to IBM 45 technology. Across 6 SNN benchmarks, our 64-SPU configuration of PEASE achieves 7.1×-17.5× and 2.6×-5.8× speedups, respectively, over software implementations on an Intel Xeon E5-2680 CPU and NVIDIA Tesla K40C GPU. The energy reductions over the CPU and GPU are 71×-179× and 198×-467×, respectively.

Full Text