Design and Implementation of a Universal Shift Convolutional Neural Network Accelerator

Qingzeng Song,Guanghao Jin,Weizhi Cui,Liankun Sun

doi:10.1109/les.2022.3233796

Abstract

Currently, many applications implement convolutional neural networks on CPUs or GPUs while the performances are limited by the computational complexity of these networks. Compared with the implementation on CPUs or GPUs, deploying convolutional neural accelerators on FPGAs can achieve superior performance. On the other side, the multiplication operations of convolutional neural networks have been a constraint for FPGAs to achieve the better performance. In this paper, we proposed a shift convolutional neural network accelerator, which converts the multiplication operations into shift operations. Based on the shift operation, our accelerator can break the computational bottleneck of FPGAs. On Virtex UltraScale+ VU9P, our accelerator can save DSP resources and reduce memory consumption while achieving a performance of 1.18 TOPS(Tera Operations Per Second), which is an essential improvement over the other convolutional neural accelerators.

Full Text