Performance Exploration on Pre-implemented CNN Hardware Accelerator on FPGA

Danielle Tchuinkou Kwadjo,Joel Mandebi Mbongue,Christophe Bobda

doi:10.1109/icfpt51103.2020.00055

Abstract

As the complexity of FPGA architectures increases, there is a raising need to improved productivity and performance in several computing domains such as image processing, financial analytics, edge computing and deep learning. However, vendor tools are mostly general-purpose as they attempt to provide an acceptable quality of result (QoR) on a broad set of applications, which may not exploit application/domain-specific characteristics to deliver higher QoR. In this paper, we present a divide-and-conquer design flow that enables application/domain-specific optimization on the design of convolutional neural network (CNN) architectures on Xilinx FPGAs. The proposed approach follows three fundamental steps; Step 1: Break the design down into components, Step 2: Implement these separate components, and Step 3: Efficiently generate the final design by assembling pre-built components with minimal QoR lost. Recent research has even demonstrated that such approaches may provide better QoR than that of the traditional Vivado flow in some instances [1], [2]. By pre-implementing specific components of a design, higher performance can be achieved locally and maintained to a certain extent when assembling the final circuit. This approach is supported by two main observations [1]: (1) vendor tools such as Vivado tend to deliver high performance results on small modules in a design. (2) Computing applications such as machine learning designs increase in size by replicating modules. CNN inference refers to the forward propagation of M input images through L layers. The repetition of components within CNN architectures make them suitable candidates for RapidWright implementation as the CNN sub-modules can be optimized for performance in standalone, and the achieved performance can be preserved when replicating and relocating the modules across the FPGA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Exploration on Pre-implemented CNN Hardware Accelerator on FPGA

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Evolving Optimised Convolutional Neural Networks for Lung Cancer Classification
Maximilian Achim Pfeffer ... Sai Ho Ling
Signals | VOL. 3
Maximilian Achim Pfeffer, et. al.Maximilian Achim Pfeffer ... Sai Ho Ling
05 May 2022
Signals | VOL. 3

Assessing the effects of convolutional neural network architectural factors on model performance for remote sensing image classification: An in-depth investigation
Feihao Chen ... Jin Yeu Tsou
International Journal of Applied Earth Observation and Geoinformation | VOL. 112
Feihao Chen, et. al.Feihao Chen ... Jin Yeu Tsou
01 Aug 2022
International Journal of Applied Earth Observation and Geoinformation | VOL. 112

Real-time Implementation of l -key Pose Estimation for Driver Behavior Analysis
Minjoon Kim ... Taemin Hwang
-
Minjoon Kim, et. al.Minjoon Kim ... Taemin Hwang
19 Oct 2022
19 Oct 2022

Cnns in land cover mapping with remote sensing imagery: a review and meta-analysis
Ioannis Kotaridis ... Maria Lazaridou
International Journal of Remote Sensing | VOL. 44
Ioannis Kotaridis, et. al.Ioannis Kotaridis ... Maria Lazaridou
25 Sep 2023
International Journal of Remote Sensing | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Exploration on Pre-implemented CNN Hardware Accelerator on FPGA

Abstract

Talk to us

Similar Papers