Abstract

Field-Programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd transformation and weight pruning are widely adopted to reduce the storage and arithmetic overhead in matrix multiplication of CNN on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. In this paper, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely Sub-Row-Balanced Sparsity (SRBS) pattern, to overcome the above challenge. Then, we develop a 2-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Finally, we design an FPGA accelerator that takes advantage of the SRBS pattern to eliminate low-parallelism computation and irregular memory accesses. Experimental results on VGG16 and Resnet-18 with CIFAR-10 and Imagenet show up to 4.4x and 3.06x speedup compared with the state-of-the-art dense Winograd accelerator and 52% (theoretical upper-bound is 72%) performance enhancement compared with the state-of-the-art sparse Winograd accelerator. The resulting sparsity ratio is 80% and 75% and the loss of model accuracy is negligible.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call