Bandwidth-Efficient Sparse Matrix Multiplier Architecture for Deep Neural Networks on FPGA

Mahesh M,Kala S,Nalesh S

doi:10.1109/socc52499.2021.9739346

Abstract

Deep neural networks (DNNs) are promising solutions for most of the artificial intelligence and machine learning applications in various fields like safety and transportation, medical field, weather forecasting and many more. State-of-the-art deep neural networks can have hundreds of millions of parameters, and it makes them less than ideal for mass adoption in devices with constrained memory and power requirements like edge computing devices and mobile devices. Techniques like quantization and inducing sparsity, aims to reduce the total number of computations needed for deep learning inference. General purpose computing hardware like CPUs (Central Processing Units) and GPUs (Graphic Processing Units) are not optimized for portable embedded applications as they are not energy efficient. Field Programmable Gate Arrays (FPGAs) are suitable candidates for edge computing, which delivers decent power consumption, with flexibility. We propose an efficient sparse matrix-vector multiplication (SpMV) architecture that aims to make deep learning inference faster and more efficient and also reduces memory-bandwidth bottleneck. The multiplication process is handled by multiple multiply and accumulate (MAC) channels and the architecture can use the maximum available memory bandwidth of the computing device. The proposed sparse matrix multiplier architecture has been implemented on Zynq Ultrascale+ FPGA with an operating frequency of 270 MHz and gave performance gain upto 5× when compared with existing implementation.

Full Text