Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

Sayed Omid Ayat,Ab Al-Hadi Ab Rahman,Mohamed Khalil-Hani

doi:10.3906/elk-1706-222

Abstract

In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its exibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractive solution to exploit the inherent parallelism in the feedforward process of the CNN. However, to meet the demands for high accuracy of today's practical recognition applications that typically have massive datasets, the sizes of CNNs have to be larger and deeper. Enlargement of the CNN aggravates the problem of off-chip memory bottleneck in the FPGA platform since there is not enough space to save large datasets on-chip. In this work, we propose a memory system architecture that best matches the off-chip memory traffic with the optimum throughput of the computation engine, while it operates at the maximum allowable frequency. With the help of an extended version of the Rooine model proposed in this work, we can estimate memory bandwidth utilization of the system at different operating frequencies since the proposed model considers operating frequency in addition to bandwidth utilization and throughput. In order to find the optimal solution that has the best energy efficiency, we make a trade-off between energy efficiency and computational throughput. This solution saves 18% of energy utilization with the trade-off having less than 2% reduction in throughput performance. We also propose to use a race-to-halt strategy to further improve the energy efficiency of the designed CNN accelerator. Experimental results show that our CNN accelerator can achieve a peak performance of 52.11 GFLOPS and energy efficiency of 10.02 GFLOPS/W on a ZYNQ ZC706 FPGA board running at 250 MHz, which outperforms most previous approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

Abstract

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Lead the way for us

Journal: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES	Publication Date: Mar 30, 2018
Citations: 5

Similar Papers

An FPGA Design Framework for CNN Sparsification and Acceleration
Sicheng Li ... Wei Wen
-
Sicheng Li, et. al.Sicheng Li ... Wei Wen
01 Apr 2017
01 Apr 2017

An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks
Md Najrul Islam ... Rahul Shrestha
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30
Md Najrul Islam, et. al.Md Najrul Islam ... Rahul Shrestha
01 Dec 2022
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30

Instruction Driven Cross-layer CNN Accelerator for Fast Detection on FPGA
Jincheng Yu ... Yiming Hu
ACM Transactions on Reconfigurable Technology and Systems | VOL. 11
Jincheng Yu, et. al.Jincheng Yu ... Yiming Hu
30 Sep 2018
ACM Transactions on Reconfigurable Technology and Systems | VOL. 11

Systematic analysis of FPGA-based hardware accelerators for convolutional neural networks
Fangrong Zhang
Applied and Computational Engineering | VOL. 53
Fangrong ZhangFangrong Zhang
28 Mar 2024
Applied and Computational Engineering | VOL. 53

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

Abstract

Talk to us

Similar Papers

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING &amp; COMPUTER SCIENCES

More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES