A Customized NoC Architecture to Enable Highly Localized Computing-on-the-Move DNN Dataflow

Yangshuo He,Jiayi Liu,Rui Xiao,Kaining Zhou,Kejie Huang

doi:10.1109/tcsii.2021.3134799

Abstract

The ever-increasing computation complexity of fast-growing Deep Neural Networks (DNNs) has requested new computing paradigms to overcome the memory wall in conventional Von Neumann computing architectures. The emerging Computing-In-Memory (CIM) architecture has been a promising candidate to accelerate neural network computing. However, data movement between CIM arrays may still dominate the total power consumption in conventional designs. This brief proposes a flexible CIM processor architecture named Domino and “Computing-On-the-Move” (COM) dataflow, to enable stream computing and local data access to significantly reduce data movement energy. Meanwhile, Domino employs customized distributed instruction scheduling within Network-on-Chip (NoC) to implement inter-memory computing and attain mapping flexibility. The evaluation with prevailing DNN models shows that Domino achieves 1.77-to- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.37\times $ </tex-math></inline-formula> power efficiency over several state-of-the-art CIM accelerators and improves the throughput by 1.28-to- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$13.16\times $ </tex-math></inline-formula> .

Full Text