NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators

Shanlin Xiao,Yuhao Guo,Huanliang Zheng,Cheng Li,Gezi Li,Zhiyi Yu,Wenkang Liao,Yi Luo,Jian Wang,Huipeng Deng

doi:10.1109/tvlsi.2020.3008185

Abstract

Large-scale neural network (NN) accelerators typically consist of several processing nodes, which could be implemented as a multi- or many-core chip and organized via a network-on-chip (NoC) to handle the heavy neuron-to-neuron traffic. Multiple NoC-based NN chips are connected through chip-to-chip interconnection networks to further boost the overall neural acceleration capability. Huge amounts of multicast-based traffic travel on-chip or cross chips, making the interconnection network design more challenging and become the bottleneck of the NN system performance and energy. In this article, we propose coupling intrachip and interchip communication techniques, called NeuronLink, for NN accelerators. Regarding the intrachip communication, we propose scoring crossbar arbitration, arbitration interception, and route computation parallelization techniques for virtual-channel routing, leading to a high-throughput NoC with a lower hardware cost for multicast-based traffic. Regarding the interchip communication, we propose a lightweight and NoC-aware chip-to-chip interconnection scheme, enabling efficient interconnection for NoC-based NN chips. In addition, we evaluate the proposed techniques on a four connected NoC-based deep neural network (DNN) chips with four field-programmable gate arrays (FPGAs). The experimental results show that the proposed interconnection network can efficiently manage the data traffic inside DNNs with high-throughput and low-overhead against state-of-the-art interconnects.

Full Text