A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection

Viktor Herrmann,Fritjof Steinert,Benno Stabernack,Justin Knapheide

doi:10.1109/dsd57027.2022.00021

Abstract

With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.

Full Text