Multiframe CenterNet Heatmap ROI Aggregation for Real-Time Video Object Detection

You Zhou,Yong Bai,Yongqing Chen

doi:10.1109/access.2022.3174195

You Zhou, Yong Bai + Show 1 more

Open Access

https://doi.org/10.1109/access.2022.3174195

Copy DOI

Abstract

Though Two-stage video object detectors cannot perform detection in real time, the accuracy of them is normally higher than that of one-stage video object detectors. One essence is that two-stage video detectors can easily use feature information from adjacent frames to augment key frame features. How to extract and exploit temporal features in the video stream for one-stage detectors needs further exploration. CenterNet is an anchor-free one-stage object detector that regress bounding boxes from heatmap peaks. We propose to use detected peaks and regressed boxes which encompass peak points to determine the heatmap ROIs as the extracted object heatmap features. A new relation module is designed to evaluate the similarity of heatmap ROI features and output the relation features which can effectively augment the heatmap ROI features. In the video sequence the heatmap ROIs of multiple adjacent frames are aggregated to a heatmap ROI of the key frame. Compared to CenterNet and other CenterNet-based video object detectors, our method achieves improved online real-time performance on ImageNet VID dataset with 78.8% mAP at 36 FPS.

Full Text