Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Yufei Ma,Yu Cao,Sarma Vrudhula,Jae-Sun Seo

doi:10.1145/3020078.3021736

Abstract

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs involves three-dimensional multiply and accumulate (MAC) operations with four levels of loops, which results in a large design space. Prior works either employ limited loop optimization techniques, e.g. loop unrolling, tiling and interchange, or only tune some of the design variables after the accelerator architecture and dataflow are already fixed. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. required memory access) of the CNN accelerator based on multiple design variables. We systematically explore the trade-offs of hardware cost by searching the design variable configurations, and propose a specific dataflow of hardware CNN acceleration to minimize the memory access and data movement while maximizing the resource utilization to achieve high performance. The proposed CNN acceleration scheme and architecture are demonstrated on a standalone Altera Arria 10 GX 1150 FPGA by implementing end-to-end VGG-16 CNN model and achieved 645.25 GOPS of throughput and 47.97 ms of latency, which is a >3.2× enhancement compared to state-of-the-art FPGA implementations of VGG model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA
Yufei Ma ... Sarma Vrudhula
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 26
Yufei Ma, et. al.Yufei Ma ... Sarma Vrudhula
01 Jul 2018
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 26

Improving the Performance of CNN Accelerator Architecture under the Impact of Process Variations
Jingweijia Tan ... Maodi Ma
ACM Transactions on Design Automation of Electronic Systems | VOL. 28
Jingweijia Tan, et. al.Jingweijia Tan ... Maodi Ma
09 Sep 2023
ACM Transactions on Design Automation of Electronic Systems | VOL. 28

Reconfigurable Network-on-Chip based Convolutional Neural Network Accelerator
Arash Firuzan ... Ahmad Khademzadeh
Journal of Systems Architecture | VOL. 129
Arash Firuzan, et. al.Arash Firuzan ... Ahmad Khademzadeh
23 May 2022
Journal of Systems Architecture | VOL. 129

Approach to Improve the Performance Using Bit-level Sparsity in Neural Networks
Yesung Kang ... Seokhyeong Kang
-
Yesung Kang, et. al.Yesung Kang ... Seokhyeong Kang
01 Feb 2021
01 Feb 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Abstract

Talk to us

Similar Papers