Efficient GEMM Implementation for Vision-Based Object Detection in Autonomous Driving Applications

Fatima Zahra Guerrouj,Mustapha Ramzi,Sergio Rodríguez Flórez,Abdelhafid El Ouardi,Mohamed Abouzahir

doi:10.3390/jlpea13020040

Abstract

Convolutional Neural Networks (CNNs) have been incredibly effective for object detection tasks. YOLOv4 is a state-of-the-art object detection algorithm designed for embedded systems. It is based on YOLOv3 and has improved accuracy, speed, and robustness. However, deploying CNNs on embedded systems such as Field Programmable Gate Arrays (FPGAs) is difficult due to their limited resources. To address this issue, FPGA-based CNN architectures have been developed to improve the resource utilization of CNNs, resulting in improved accuracy and speed. This paper examines the use of General Matrix Multiplication Operations (GEMM) to accelerate the execution of YOLOv4 on embedded systems. It reviews the most recent GEMM implementations and evaluates their accuracy and robustness. It also discusses the challenges of deploying YOLOv4 on autonomous vehicle datasets. Finally, the paper presents a case study demonstrating the successful implementation of YOLOv4 on an Intel Arria 10 embedded system using GEMM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Low Power Electronics and Applications	Publication Date: Jun 6, 2023
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Efficient GEMM Implementation for Vision-Based Object Detection in Autonomous Driving Applications

Abstract

Talk to us

Similar Papers

More From: Journal of Low Power Electronics and Applications

Lead the way for us

Similar Papers

Efficient Hardware Optimization for CNN
Seda Güzel Aydın ... Hasan Şakir Bilge
International Journal of Multidisciplinary Studies and Innovative Technologies | VOL. 6
Seda Güzel Aydın, et. al.Seda Güzel Aydın ... Hasan Şakir Bilge
01 Jan 2021
International Journal of Multidisciplinary Studies and Innovative Technologies | VOL. 6

Instruction Driven Cross-layer CNN Accelerator for Fast Detection on FPGA
Jincheng Yu ... Yu Wang
ACM Transactions on Reconfigurable Technology and Systems | VOL. 11
Jincheng Yu, et. al.Jincheng Yu ... Yu Wang
30 Sep 2018
ACM Transactions on Reconfigurable Technology and Systems | VOL. 11

An Effective Design to Improve the Efficiency of DPUs on FPGA
Yutian Lei ... Sangyoon Oh
-
Yutian Lei, et. al.Yutian Lei ... Sangyoon Oh
01 Dec 2020
01 Dec 2020

High-Performance Acceleration of 2-D and 3-D CNNs on FPGAs Using Static Block Floating Point.
Hongxiang Fan ... Shuanglong Liu
IEEE Transactions on Neural Networks and Learning Systems | VOL. PP
Hongxiang Fan, et. al.Hongxiang Fan ... Shuanglong Liu
01 Aug 2023
IEEE Transactions on Neural Networks and Learning Systems | VOL. PP

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient GEMM Implementation for Vision-Based Object Detection in Autonomous Driving Applications

Abstract

Talk to us

Similar Papers

More From: Journal of Low Power Electronics and Applications