Abstract

Large-scale neural network accelerators are often implemented as a many-core chip and rely on a network-on-chip to manage the huge amount of inter-neuron traffic. The baseline and different variations of the well-known mesh and tree topologies are the most popular topologies in prior many-core implementations of neural networks. However, the grid-like mesh and hierarchical tree topologies suffer from high diameter and low bisection bandwidth, respectively. In this paper, we present ClosNN, a customized Clos topology for Neural Networks. The inherent capability of Clos to support multicast and broadcast traffic in a simple and efficient way, as well as its adaptable bisection bandwidth, is the major motivation behind proposing a customized version of this topology as the communication infrastructure of large-scale neural network implementations. We compare ClosNN with some state-of-the-art NoC topologies adopted in recent neural network hardware accelerators and show that it offers lower average message hop count and higher throughput, which directly translates to faster neural information processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.