Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Van-Cam Nguyen,Yasuhiko Nakashima

doi:10.1587/transinf.2022edp7155

Abstract

Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems

Lead the way for us

Similar Papers

An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs
Jiang Zhu ... Jianqi Li
IEEE Access | VOL. 8
Jiang Zhu, et. al.Jiang Zhu ... Jianqi Li
01 Jan 2020
IEEE Access | VOL. 8

CNN inference acceleration on limited resources FPGA platforms_epilepsy detection case study
Afef Saidi ... Slim Ben Othman
International Journal of Informatics and Communication Technology (IJ-ICT) | VOL. 12
Afef Saidi, et. al.Afef Saidi ... Slim Ben Othman
01 Dec 2023
International Journal of Informatics and Communication Technology (IJ-ICT) | VOL. 12

Research and Implementation of High Computational Power for Training and Inference of Convolutional Neural Networks
Tianling Li ... Bin He
Applied Sciences | VOL. 13
Tianling Li, et. al.Tianling Li ... Bin He
11 Jan 2023
Applied Sciences | VOL. 13

FitNN: A Low-Resource FPGA-Based CNN Accelerator for Drones
Zhichao Zhang ... Abbas Z Kouzani
IEEE Internet of Things Journal | VOL. 9
Zhichao Zhang, et. al.Zhichao Zhang ... Abbas Z Kouzani
01 Nov 2022
IEEE Internet of Things Journal | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems