DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

Dong Wang,Jingning Guo,Soheil Ghiasi,Ke Xu

doi:10.1109/tcad.2020.2968023

Abstract

Field-programmable gate array (FPGA)-based accelerators for convolutional neural network (CNN) inference have received significant attention in recent years. The reported designs tend to adopt a similar underlying approach based on multiplier-accumulator (MAC) arrays, which yields strong demand for the available on-chip DSP blocks, while leaving FPGA logic and memory resources underutilized. The practical outcome is that the computational roof of the accelerator is bound by the number of DSP blocks offered by the target FPGA. In addition, integrating the CNN accelerator with other functional units that may also need DSP blocks would degrade the inference performance. Leveraging the robustness of inference accuracy to limited arithmetic precision, we propose a transformation to the convolution computation, which leads to transformation of the accelerator design space and relaxes the pressure on the required DSP resources. Through analytical and empirical evaluations, we demonstrate that our approach enables us to strike a favorable balance between utilization of the FPGA on-chip memory, logic, and DSP resources, due to which, our accelerator considerably outperforms state of the art. We report the effectiveness of our approach on a variety of FPGA devices, including Cyclone-V, Stratix-V, and Arria-10, which are used in large number of applications, ranging from embedded settings to high performance computing. Our proposed technique yields 1.5x throughput improvement and 4x DSP resource reduction compared to the best frequency domain convolution-based accelerator, and 2.5x boost in raw arithmetic performance and 8.4x saving in DSPs compared to a state-of-the-art sparse convolution-based accelerator.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Jan 21, 2020
Citations: 42

Similar Papers

ABM-SpConv
Dong Wang ... Ke Xu
-
Dong Wang, et. al.Dong Wang ... Ke Xu
02 Jun 2019
02 Jun 2019

Reconfigurable Data Planes for Scalable Network Virtualization
Deepak Unnikrishnan ... Jeremie Crenne
IEEE Transactions on Computers | VOL. 62
Deepak Unnikrishnan, et. al.Deepak Unnikrishnan ... Jeremie Crenne
01 Dec 2013
IEEE Transactions on Computers | VOL. 62

Efficient Hardware Optimization for CNN
Seda Güzel Aydın ... Hasan Şakir Bilge
International Journal of Multidisciplinary Studies and Innovative Technologies | VOL. 6
Seda Güzel Aydın, et. al.Seda Güzel Aydın ... Hasan Şakir Bilge
01 Jan 2021
International Journal of Multidisciplinary Studies and Innovative Technologies | VOL. 6

Efficient FPGA Cost-Performance Space Exploration using Type-Driven Program Transformations
Cristian Urlea ... Syed Waqar Nabi
-
Cristian Urlea, et. al.Cristian Urlea ... Syed Waqar Nabi
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems