Abstract

Large-scale neural network (NN) accelerators typically consist of several processing nodes, which could be implemented as a multi- or many-core chip and organized via a network-on-chip (NoC) to handle the heavy neuron-to-neuron traffic. Multiple NoC-based NN chips are connected through chip-to-chip interconnection networks to further boost the overall neural acceleration capability. Huge amounts of multicast-based traffic travel on-chip or cross chips, making the interconnection network design more challenging and become the bottleneck of the NN system performance and energy. In this article, we propose coupling intrachip and interchip communication techniques, called NeuronLink, for NN accelerators. Regarding the intrachip communication, we propose scoring crossbar arbitration, arbitration interception, and route computation parallelization techniques for virtual-channel routing, leading to a high-throughput NoC with a lower hardware cost for multicast-based traffic. Regarding the interchip communication, we propose a lightweight and NoC-aware chip-to-chip interconnection scheme, enabling efficient interconnection for NoC-based NN chips. In addition, we evaluate the proposed techniques on a four connected NoC-based deep neural network (DNN) chips with four field-programmable gate arrays (FPGAs). The experimental results show that the proposed interconnection network can efficiently manage the data traffic inside DNNs with high-throughput and low-overhead against state-of-the-art interconnects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call