A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA

Victoria Heekyung Kim,Kyuwon Ken Choi

doi:10.1109/access.2023.3285279

Victoria Heekyung Kim, Kyuwon Ken Choi

Open Access

https://doi.org/10.1109/access.2023.3285279

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 14	License type: CC BY-NC-ND 4.0

Affiliation: Illinois Institute of Technology

Abstract

In limited-resource edge computing circumstances such as on mobile devices, IoT devices, and electric vehicles, the energy-efficient optimized convolutional neural network (CNN) accelerator implemented on mobile Field Programmable Gate Array (FPGA) is becoming more attractive due to its high accuracy and scalability. In recent days, mobile FPGAs such as the Xilinx PYNQ-Z1/Z2 and Ultra96, definitely have the advantage of scalability and flexibility for the implementation of deep learning algorithm-based object detection applications. It is also suitable for battery-powered systems, especially for drones and electric vehicles, to achieve energy efficiency in terms of power consumption and size aspect. However, it has the low and limited performance to achieve real-time processing. In this article, optimizing the accelerator design flow in the register-transfer level (RTL) will be introduced to achieve fast programming speed by applying low-power techniques on FPGA accelerator implementation. In general, most accelerator optimization techniques are conducted on the system level on the FPGA. In this article, we propose the reconfigurable accelerator design for a CNN-based object detection system on the register-transfer level on mobile FPGA. Furthermore, we present RTL optimization design techniques that will be applied such as various types of clock gating techniques to eliminate residual signals and to deactivate the unnecessarily active block. Based on the analysis of the CNN-based object detection architecture, we analyze and classify the common computing operation components from the Convolutional Neuron Network, such as multipliers and adders. We implement a multiplier/adder unit to a universal computing unit and modularize it to be suitable for a hierarchical structure of RTL code. The proposed system design was tested with Resnet-20 which has 23 layers and it was trained with the dataset, CIFAR-10 which provides a test set of 10,000 images in several formats, and the weight data we used for this experiment was provided from Tensil. Experimental results show that the proposed design process improves the power efficient consumption, hardware utilization, and throughput by 16%, up to 58%, and 15%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Solution to Optimize Multi-Operand Adders in CNN Architecture on FPGA
Fasih Ud Din Farrukh ... Chun Zhang
-
Fasih Ud Din Farrukh, et. al.Fasih Ud Din Farrukh ... Chun Zhang
01 May 2019
01 May 2019

A High-Performance and Ultra-Low-Power Accelerator Design for Advanced Deep Learning Algorithms on an FPGA
Achyuth Gundrapally ... Nader Alnatsheh
Electronics | VOL. 13
Achyuth Gundrapally, et. al.Achyuth Gundrapally ... Nader Alnatsheh
08 Jul 2024
Electronics | VOL. 13

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
Behzad Salami ... Adrian Cristal Kestelman
-
Behzad Salami, et. al.Behzad Salami ... Adrian Cristal Kestelman
01 Jun 2020
01 Jun 2020

RETRACTED ARTICLE: A novel cognitive Wallace compressor based multi operand adders in CNN architecture for FPGA
T Kowsalya
Journal of Ambient Intelligence and Humanized Computing | VOL. 12
T KowsalyaT Kowsalya
07 Aug 2020
Journal of Ambient Intelligence and Humanized Computing | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA

Abstract

Talk to us

Similar Papers

More From: IEEE Access