Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration Framework

Zhuoran Song,Li Jiang,Heng Lu,Naifeng Jing,Xiaoyao Liang

doi:10.1109/tcad.2022.3217667

Zhuoran Song, Li Jiang + Show 3 more

https://doi.org/10.1109/tcad.2022.3217667

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Due to the restricted on-chip computing capability for deep neural network (DNN) processing, high-definition video recognition (VOR) task is not easily achievable as a real-time task in a consumer SoC. Despite the fact that many accelerators have been proposed for fast VOR, they remain isolated from a video decoder’s inherent video compression knowledge. Therefore, in this paper, we propose a video decoder-assisted neural network acceleration framework for real-time video recognition. Firstly, given the fact that the non-key frames can be dynamically reconstructed by the key frames with high fidelity during video compression, we propose the VR-DANN algorithm that reconstructs the VOR results of non-key frames in a similar way so as to save a large amount of NN computing power. In VR-DANN, we leverage motion vectors, the tempo-spatial information already available in the video decoding process to facilitate the recognition process, and propose a lightweight NN-based refinement scheme to suppress the non-pixel recognition noise. Moreover, we consider that there is numerous redundant information in the video frames because the objects of interest usually take a small portion in a video frame. We, therefore, propose the object based acceleration algorithm (Jigsaw-VOR) to avoid unnecessary computation by dropping out the redundant information in the frames before going through the computing-intensive DNN process. Concretely, we adopt the motion vectors to track the rough position for the objects of interest and then merge them into a consolidated frame for DNN processing like a jigsaw game. The acceleration comes from the processing of much fewer consolidated frames compared to the raw frames in a video stream. The VR-DANN and Jigsaw-VOR can be integrated for further speedup. From the hardware side, we propose the VR-DANN and Jigsaw-VOR architectures to respectively accelerate the VR-DANN and Jigsaw-VOR algorithms. These two architectures can be combined to gain higher performance improvement. Our experimental results show that the VR-DANN architecture achieves 2.9× performance improvement with less than 1% accuracy loss compared with the state-of-the-art “FAVOS” scheme. In addition, the experimental results show that applying Jigsaw-VOR to all frames can achieve 2.4× performance improvement with comparable accuracy compared to “FAVOS”. By combining VR-DANN and Jigsaw-VOR schemes, the performance improvement can reach up to 3.6×.

Full Text

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration Framework

Abstract

Published Version

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Jul 1, 2023
Citations: 1

Similar Papers

Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory
Young Sik Lee ... Tae Hee Han
IEEE Access | VOL. 9
Young Sik Lee, et. al.Young Sik Lee ... Tae Hee Han
01 Jan 2020
IEEE Access | VOL. 9

16.5 DynaPlasia: An eDRAM In-Memory-Computing-Based Reconfigurable Spatial Accelerator with Triple-Mode Cell for Dynamic Resource Switching
Sangjin Kim ... Sangyeob Kim
-
Sangjin Kim, et. al.Sangjin Kim ... Sangyeob Kim
19 Feb 2023
19 Feb 2023

Video Coding With Key Frames Guided Super-Resolution
Qiang Zhou ... Li Song
-
Qiang Zhou, et. al.Qiang Zhou ... Li Song
01 Jan 2009
01 Jan 2009

Real-time and accurate object detection in compressed video by long short-term feature aggregation
Xinggang Wang ... Chang Huang
Computer Vision and Image Understanding | VOL. 206
Xinggang Wang, et. al.Xinggang Wang ... Chang Huang
05 Mar 2021
Computer Vision and Image Understanding | VOL. 206

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration Framework

Abstract

Published Version

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems