Abstract

Convolutional Neural Networks (CNNs) have been shown to be very useful in image recognition and other Artificial Intelligence (AI) applications, however, at the expense of intensive computation requirement. To address the challenge of overwhelming calculation requirements, researchers have proposed various network pruning techniques. But, due to the irregular sparse patterns, unstructured sparse networks are difficult to compute efficiently on either Graphic processing units (GPUs) or Field Programmable Gate Arrays (FPGAs). In this paper, we propose a software/hardware co-optimized Reconfigurable Sparse convolutional Neural Network accelerator design (RSNN) on FPGAs. A novel sparse convolution dataflow is proposed with simpler control logic than existing mux-based selection logic. To balance the computation load on different Processing Units (PUs), we propose a software-based load-balance aware pruning technique as well as a kernel merging method. Experimental results show that RSNN is $2.41\times - 7.91\times $ better on Digital Signal Processor (DSP) efficiency than previous dense CNN FPGA accelerators, and $1.23\times - 2.93\times $ better than previous sparse CNN FPGA accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call