An FPGA-Based Convolutional Neural Network Coprocessor

Changpei Qiu,Bo Wang,Qiuping Li,Tianxia Zhao,Xin’An Wang,Hu Wang,Wenqing Wu

doi:10.1155/2021/3768724

Abstract

In this paper, an FPGA-based convolutional neural network coprocessor is proposed. The coprocessor has a 1D convolutional computation unit PE in row stationary (RS) streaming mode and a 3D convolutional computation unit PE chain in pulsating array structure. The coprocessor can flexibly control the number of PE array openings according to the number of output channels of the convolutional layer. In this paper, we design a storage system with multilevel cache, and the global cache uses multiple broadcasts to distribute data to local caches and propose an image segmentation method that is compatible with the hardware architecture. The proposed coprocessor implements the convolutional and pooling layers of the VGG16 neural network model, in which the activation value, weight value, and bias value are quantized using 16-bit fixed-point quantization, with a peak computational performance of 316.0 GOP/s and an average computational performance of 62.54 GOP/s at a clock frequency of 200 MHz and a power consumption of about 9.25 W.

Highlights

Hardware acceleration of artificial neural networks (ANNs) has been a hot research topic since the 1990s [1, 2]
We provide a coprocessor implementation for convolutional neural networks, which is aimed at accelerating the convolutional and pooling layers of convolutional neural networks on FPGAs and applying them to heterogeneous accelerated systems or embedded terminals
The PE array designed in this paper contains 4 PE chains, and each PE chain corresponds to a channel of the input feature map, and when there are only three channels of the input feature map, the value stored in the fourth local image buffers (LIBs) is all 0 for the convenience of control

Summary

Introduction

Hardware acceleration of artificial neural networks (ANNs) has been a hot research topic since the 1990s [1, 2]. Convolutional neural networks have been proposed since 1989 and did not become a research hotspot until 2006, mainly due to the difficulty of hardware computing power at that time. All Arithmetic Logic Units (ALUs) share controllers and memory In these computing platforms, the convolutional and fully connected layers are mapped into matrix multiplication to participate in the computation. FPGAs are highly programmable and configurable, with high energy efficiency and short development cycles, especially with tools such as High Level Synthesis and OpenCL, which accelerate the development of FPGAs. Sankaradas et al designed a coprocessor for CNN based on FPGA [11] with low precision data bit-width (20-bit fixed-point quantization for weights and 16-bit fixed-point quantization for feature map values), supporting only fixed size convolutional kernel size, frequent

Reuse: Filter weights

Coprocessor Architecture

Design of Each Major Module in the Coprocessor

Result rows

A13 A14 A15

FPGA Hardware Verification

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Wireless Communications and Mobile Computing	Publication Date: Jun 12, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An FPGA-Based Convolutional Neural Network Coprocessor

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing

Lead the way for us

Similar Papers

Training Multi-Layer Perceptron with Enhanced Brain Storm Optimization Metaheuristics
Nebojsa Bacanin ... K Venkatachalam
Computers, materials & continua | VOL. 70
Nebojsa Bacanin, et. al.Nebojsa Bacanin ... K Venkatachalam
01 Jan 2021
Computers, materials & continua | VOL. 70

Fully Convolutional Neural Network Structure and Its Loss Function for Image Classification
Qiuyu Zhu ... Xuewen Zu
IEEE access : practical innovations, open solutions | VOL. 10
Qiuyu Zhu, et. al.Qiuyu Zhu ... Xuewen Zu
01 Jan 2021
IEEE access : practical innovations, open solutions | VOL. 10

Multi Level Caching and Anticipated Parallel Processing-Based Algorithm for Improving the Performance of the Distributed File System
B Rangaswamy ... T Ragunathan
International Journal of Computer Applications | VOL. 121
B Rangaswamy, et. al.B Rangaswamy ... T Ragunathan
18 Jul 2015
International Journal of Computer Applications | VOL. 121

IMAGE SEGMENTATION WITH A CONVOLUTIONAL NEURAL NETWORK WITHOUT POOLING LAYERS IN DERMATOLOGICAL DISEASE DIAGNOSTICS SYSTEMS
M V Polyakova
Radio Electronics Computer Science Control | VOL. -
M V PolyakovaM V Polyakova
25 Feb 2023
Radio Electronics Computer Science Control | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An FPGA-Based Convolutional Neural Network Coprocessor

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing