FSA: A Fine-Grained Systolic Accelerator for Sparse CNNs

Fanrong Li,Xiangyu He,Gang Li,Zitao Mo,Jian Cheng

doi:10.1109/tcad.2020.3012212

Abstract

Sparsity, as an intrinsic property of convolutional neural networks (CNNs), has been widely employed for hardware acceleration, and many customized accelerators tailored for sparse weights or activations have been proposed in these years. However, the irregular sparse patterns introduced by both weights and activations are much more challenging for efficient computation. For example, due to the issues of access contention, workload imbalance, and tile fragmentation, the state-of-the-art sparse accelerator SCNN fails to fully leverage the benefits of sparsity, leading to nonoptimal results for both speedup and energy efficiency. In this article, we propose an efficient sparse CNN accelerator for both weights and activations, namely finegrained systolic accelerator (FSA), which jointly optimizes both hardware dataflow and software partitioning and scheduling strategy. Specifically, to deal with the access contentions problem, we present a fine-grained systolic dataflow, in which the activations move rhythmically along the horizontal processing element array while the weights are fed into the array in a fine-grained order. We then propose a hybrid network partitioning strategy that sets different partitioning strategies for different layers to balance the workload and alleviate the fragmentation problem caused by both sparse weights and activations. Finally, we present a scheduling search strategy to find the optimized schedules for neural networks, which can further improve energy efficiency. Extensive evaluations show that the proposed FSA consistently outperforms SCNN over AlexNet, VGGNet, GoogLeNet, and ResNet with an average speedup of 1.74× and up to 13.86× energy efficiency.

Full Text