Abstract

Object detection and classification is an essential task of computer vision. A very efficient algorithm for detection and classification is YOLO (You Look Only Once). We consider hardware architectures to run YOLO in real-time on embedded platforms. Designing a new dedicated accelerator for each new version of YOLO is not feasible given the fast delivery of new versions. This work’s primary goal is to design a configurable and scalable core for creating specific object detection and classification systems based on YOLO, targeting embedded platforms. The core accelerates the execution of all the algorithm steps, including pre-processing, model inference and post-processing. It considers a fixed-point format, linearised activation functions, batch-normalisation, folding, and a hardware structure that exploits most of the available parallelism in CNN processing. The proposed core is configured for real-time execution of YOLOv3-Tiny and YOLOv4-Tiny, integrated into a RISC-V-based system-on-chip architecture and prototyped in an UltraScale XCKU040 FPGA (Field Programmable Gate Array). The solution achieves a performance of 32 and 31 frames per second for YOLOv3-Tiny and YOLOv4-Tiny, respectively, with a 16-bit fixed-point format. Compared to previous proposals, it improves the frame rate at a higher performance efficiency. The performance, area efficiency and configurability of the proposed core enable the fast development of real-time YOLO-based object detectors on embedded systems.

Highlights

  • Object detectors have a wide range of application fields such as security, transportation, military and medical

  • The main contributions include the development of Deep Neural Networks (DNNs), mostly Convolutional Neural Networks (CNNs), and the increase of hardware computing power

  • This paper proposes a configurable core for the efficient execution of full object detectors based on Tiny You Only Look Once (YOLO)

Read more

Summary

INTRODUCTION

Object detectors have a wide range of application fields such as security, transportation, military and medical. One-stage detectors treat object detection as a regression/classification problem by adopting a unified framework to obtain the labels and locations directly These detectors map straightly from image pixels to bounding box coordinates and class probabilities. D. Pestana et al.: Full Featured Configurable Accelerator for Object Detection With YOLO predicted boxes directly from input images without the region proposal step. Recent studies [18], [19] have been using Field Programmable Gate Arrays (FPGAs) as a more energy-efficient alternative to GPUs for executing DNNs. FPGAs provide advantages such as high dedicated hardware design flexibility, fixed-point calculation, parallel computing and low power consumption. While running Tiny-YOLOv3 for 768 × 576 images, the solution achieves a performance of 32 frames per second, with the hardware accelerator executing with a 143 MHz clock frequency.

BACKGROUND
BATCH-NORMALIZATION
RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call