Abstract

We present an efficient and scalable partitioning method for mapping large-scale neural network models with locally dense and globally sparse connectivity onto reconfigurable neuromorphic hardware. Scalability in computational efficiency, i.e., amount of time spent in actual computation, remains a huge challenge in very large networks. Most partitioning algorithms also struggle to address the scalability in network workloads in finding a globally optimal partition and efficiently mapping onto hardware. As communication is regarded as the most energy and time-consuming part of such distributed processing, the partitioning framework is optimized for compute-balanced, memory-efficient parallel processing targeting low-latency execution and dense synaptic storage, with minimal routing across various compute cores. We demonstrate highly scalable and efficient partitioning for connectivity-aware and hierarchical address-event routing resource-optimized mapping, significantly reducing the total communication volume recursively when compared to random balanced assignment. We showcase our results working on synthetic networks with varying degrees of sparsity factor and fan-out, small-world networks, feed-forward networks, and a hemibrain connectome reconstruction of the fruit-fly brain. The combination of our method and practical results suggest a promising path toward extending to very large-scale networks and scalable hardware-aware partitioning.

Highlights

  • There has been a growing interest in the scientific community to attain a comprehensive understanding of the brain (Markram et al, 2011; Kandel et al, 2013) using actual in-vivo brain recordings or simulation models using spiking neural networks (SNNs)

  • The hierarchical partitioning algorithm was tested with a wide variety of networks in order to show invariance to input topological structure

  • We observed a significant improvement of the hierarchical partitioning method over randomly balanced neuron placement and even flat METIS partitioning

Read more

Summary

Introduction

There has been a growing interest in the scientific community to attain a comprehensive understanding of the brain (Markram et al, 2011; Kandel et al, 2013) using actual in-vivo brain recordings or simulation models using spiking neural networks (SNNs) Simulating such large brain-size networks (Ananthanarayanan and Modha, 2007) with massive size and complexity of neurons and interconnections between them is extremely challenging to realize using the computational capability of today’s digital multiprocessors. Computing systems with high-bandwidth interconnects between individual compute elements are crucial for such enormously distributed processing, and to demonstrate performance efficiency at brain scale. Data movement through a Network-On-Chip (NoC) becomes the most challenging part in the synchronization and event exchange of many-core spiking processors This communication becomes the limiting factor in the processing, while the computation scales linearly with the number of cores (Musoles et al, 2019). Distributed computing for spiking networks is the most efficient when performed with a combination of load-balancing, with which computation at the lowest latency is realized for any given network; and inter-core connectivity minimization, which ensures the most optimum reduction in traffic volume over the network

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call