Abstract

Video object detection is a tough task due to the severe appearance degradation caused by rapid motion, sudden occlusion or rare poses. The great challenge facing video object detection is the simultaneous requirements on both accuracy and speed because the pursuit of one aspect usually causes significant expense to the other. Most existing methods mainly focus on improving detection accuracy with little attention to computationally efficient solutions, and thus they are impractical for many real-world applications. This motivates us to develop a real-time and high-accuracy video object detection method. In this paper, we propose a novel video object detector, called FastVOD-Net, which can yield highly accurate detection results at real-time speed. Specifically, we first develop a temporally-cascaded deformable alignment (TCDA) module to model the object displacements induced by video motion. Then, we introduce another two modules, namely spatially-refined temporal aggregation (SRTA) and attention-guided semantic distillation (AGSD), to improve the appearance feature of the currently processed frame and enhance the semantic representation of non-keyframes, respectively. For keyframe scheduling, we design an adaptive keyframe selection scheduler (AKSS) to adjust the keyframe interval online, making the keyframe usage more rational. On one hand, the characteristics of our FastVOD-Net enable it to sparsely perform expensive feature extraction, which significantly reduces the computational cost and thus guarantees real-time speed. On the other hand, the collaboration of the above tightly-coupled modules and adaptive keyframe scheduler makes FastVOD-Net fully exploit inter-frame temporal dependencies and thus guarantees high accuracy. Experiments on the ImageNet VID dataset show that our FastVOD-Net achieves 79.3% mAP at 29.6 fps or 81.2% mAP at 23.0 fps on an Nvidia RTX 2080 Ti GPU, which is the state-of-the-art performance in real time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call