A cross frame post-processing strategy for video object detection

Xin Song,Ziqiang Qi,Jianlin Zhu,Shuhua Li

doi:10.1016/j.displa.2022.102230

Abstract

Video-based object detection plays an important role in the real world and scientific research. Compared with still images, video detection is more challenging due to occlusion, rare poses, high-speed movement, frames loss, etc. In order to improve the existing video stream detectors widely and with low coupling, a post-processing strategy, CFPP, is proposed in this work. The framework can establish a cross frame link based on deep learning, connect the proposals belonging to the same object, and improve the performance of the detector by optimizing the classification confidence and object coordinates. Furthermore, CFPP can connect the proposals in adjacent and non adjacent frames at the same time, which makes it exploit the context information of video stream more effectively than other post-processing strategies. Experiments shows that CFPP can improve the existing detectors (e.g. we improve the mAP of YOLOv4 on ImageNet VID dataset form 69.24% to 78.15%). In addition, experiments show that the designed framework can achieve better detection effect than other strategies in the case of high-speed moving object and frames loss.

Full Text