Abstract

Deep neural network (DNN) processing units, or DPUs, are one of the most energy-efficient platforms for DNN applications. However, designing new DPUs for every DNN model is very costly and time consuming. In this article, we propose an alternative approach: to specialize coarse-grained reconfigurable architectures (CGRAs), which are already quite capable of delivering high performance and high energy efficiency for compute-intensive kernels. We identify a small set of architectural features on a baseline CGRA to enable high-performance mapping of depthwise convolution (DWC) and pointwise convolution (PWC) kernels, which are the most important building block in recent light-weight DNN models. Our experimental results using MobileNets demonstrate that our proposed CGRA enhancement can deliver <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8\sim 18\times $ </tex-math></inline-formula> improvement in area-delay product (ADP) depending on layer type, over a baseline CGRA with a state-of-the-art CGRA compiler. Moreover, our proposed CGRA architecture can also speed up 3-D convolution with similar efficiency as previous work, demonstrating the effectiveness of our architectural features beyond depthwise separable convolution (DSC) layers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call