Abstract
Object detection and classification is an essential task of computer vision. A very efficient algorithm for detection and classification is YOLO (You Look Only Once). We consider hardware architectures to run YOLO in real-time on embedded platforms. Designing a new dedicated accelerator for each new version of YOLO is not feasible given the fast delivery of new versions. This work’s primary goal is to design a configurable and scalable core for creating specific object detection and classification systems based on YOLO, targeting embedded platforms. The core accelerates the execution of all the algorithm steps, including pre-processing, model inference and post-processing. It considers a fixed-point format, linearised activation functions, batch-normalisation, folding, and a hardware structure that exploits most of the available parallelism in CNN processing. The proposed core is configured for real-time execution of YOLOv3-Tiny and YOLOv4-Tiny, integrated into a RISC-V-based system-on-chip architecture and prototyped in an UltraScale XCKU040 FPGA (Field Programmable Gate Array). The solution achieves a performance of 32 and 31 frames per second for YOLOv3-Tiny and YOLOv4-Tiny, respectively, with a 16-bit fixed-point format. Compared to previous proposals, it improves the frame rate at a higher performance efficiency. The performance, area efficiency and configurability of the proposed core enable the fast development of real-time YOLO-based object detectors on embedded systems.
Highlights
Object detectors have a wide range of application fields such as security, transportation, military and medical
The main contributions include the development of Deep Neural Networks (DNNs), mostly Convolutional Neural Networks (CNNs), and the increase of hardware computing power
This paper proposes a configurable core for the efficient execution of full object detectors based on Tiny You Only Look Once (YOLO)
Summary
Object detectors have a wide range of application fields such as security, transportation, military and medical. One-stage detectors treat object detection as a regression/classification problem by adopting a unified framework to obtain the labels and locations directly These detectors map straightly from image pixels to bounding box coordinates and class probabilities. D. Pestana et al.: Full Featured Configurable Accelerator for Object Detection With YOLO predicted boxes directly from input images without the region proposal step. Recent studies [18], [19] have been using Field Programmable Gate Arrays (FPGAs) as a more energy-efficient alternative to GPUs for executing DNNs. FPGAs provide advantages such as high dedicated hardware design flexibility, fixed-point calculation, parallel computing and low power consumption. While running Tiny-YOLOv3 for 768 × 576 images, the solution achieves a performance of 32 frames per second, with the hardware accelerator executing with a 143 MHz clock frequency.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have