Dataflow and microarchitecture co‐optimisation for sparse CNN on distributed processing element accelerator

Duc‐An Pham,Bo‐Cheng Lai

doi:10.1049/iet-cds.2019.0225

Abstract

Accelerators that utilise the sparsity of both activation data and network structure of convolutional neural networks (CNNs) have demonstrated efficient processing of CNNs with superior performance. Previous research studies have shown three critical design concerns when designing accelerators for sparse CNNs, including data reuse, parallel computing performance, and effective sparse computation. These factors were each used in the previous accelerator designs, but none of the designs have considered all the factors at the same time. This study provides analytical approaches and experimental results to reveal the insight of accelerator design for sparse CNNs. The authors have shown that the architectural aspects need to be all considered to avoid performance pitfalls, including their mutual effects. Based on the proposed analytical approach, they proposed enhancement techniques and co-designed among the factors discussed in this study. The improved architecture shows up to 1.5× data reuse and/or 1.55× performance improvement in comparison with state-of-the-art sparse CNN accelerators while still maintaining equal area and energy cost.

Full Text