Double MAC on a DSP: Boosting the Performance of Convolutional Neural Networks on FPGAs

Sugil Lee,Daewoo Kim,Jongeun Lee,Dong Nguyen

doi:10.1109/tcad.2018.2824280

Abstract

Deep learning workloads, such as convolutional neural networks (CNNs) are important due to increasingly demanding high-performance hardware acceleration. One distinguishing feature of a deep learning workload is that it is inherently resilient to small numerical errors and thus works very well with low precision hardware. We propose a novel method called double multiply-and-accumulate (MAC) to theoretically double the computation rate of CNN accelerators by packing two MAC operations into one digital signal processing block of off-the-shelf field-programmable gate arrays (FPGAs). We overcame several technical challenges by exploiting the mode of operation in the CNN accelerator. We have validated our method through FPGA synthesis and Verilog simulation, and evaluated our method by applying it to the state-of-the-art CNN accelerator. The double MAC approach used can double the computation throughput of a CNN layer. On the network level (all convolution layers combined), the performance improvement varies depending on the CNN application and FPGA size, from 14% to more than 80% over a highly optimized state-of-the-art accelerator solution, without sacrificing the output quality significantly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Double MAC on a DSP: Boosting the Performance of Convolutional Neural Networks on FPGAs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: May 1, 2019
Citations: 46

Similar Papers

An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks
Md Najrul Islam ... Rahul Shrestha
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30
Md Najrul Islam, et. al.Md Najrul Islam ... Rahul Shrestha
01 Dec 2022
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
Behzad Salami ... Adrian Cristal Kestelman
-
Behzad Salami, et. al.Behzad Salami ... Adrian Cristal Kestelman
01 Jun 2020
01 Jun 2020

Design Space Exploration for YOLO Neural Network Accelerator
Hongmin Huang ... Zihao Liu
Electronics | VOL. 9
Hongmin Huang, et. al.Hongmin Huang ... Zihao Liu
16 Nov 2020
Electronics | VOL. 9

CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge
Yeongmin Kim ... Arslan Munir
IEEE Access | VOL. 8
Yeongmin Kim, et. al.Yeongmin Kim ... Arslan Munir
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Double MAC on a DSP: Boosting the Performance of Convolutional Neural Networks on FPGAs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems