Data streaming and traffic gathering in mesh-based NoC for deep neural network acceleration

Binayak Tiwari,Mei Yang,Xiaohang Wang,Yingtao Jiang

doi:10.1016/j.sysarc.2022.102466

Binayak Tiwari, Mei Yang + Show 2 more

Open Access

https://doi.org/10.1016/j.sysarc.2022.102466

Copy DOI

Journal: Journal of Systems Architecture	Publication Date: Mar 19, 2022
Citations: 1	License type: publisher-specific-oa

Affiliation: University of Nevada, Las Vegas

Abstract

The increasing popularity of deep neural network (DNN) applications demands high computing power and efficient hardware accelerator architecture. DNN accelerators use a large number of processing elements (PEs) and on-chip memory for storing weights and other parameters. As the communication backbone of a DNN accelerator, networks-on-chip (NoC) play an important role in supporting various dataflow patterns and enabling processing with communication parallelism in a DNN accelerator. However, the widely used mesh-based NoC architectures inherently cannot support the efficient one-to-many and many-to-one traffic largely existing in DNN workloads. In this paper, we propose a modified mesh architecture with a one-way/two-way streaming bus to speedup one-to-many (multicast) traffic, and the use of gather packets to support many-to-one (gather) traffic. The analysis of the runtime latency of a convolutional layer shows that the two-way streaming architecture achieves better improvement than the one-way streaming architecture for an Output Stationary (OS) dataflow architecture. The simulation results demonstrate that the gather packets can reduce the runtime latency up to 1.8 times and network power consumption up to 1.7 times, compared to the repetitive unicast method on modified mesh architectures supporting two-way streaming. Furthermore, the comparison with state-of-the-art mesh-based accelerator shows that the proposed gather supporting scheme has the advantages in both area efficiency and power efficiency.

Full Text