Abstract

In video object detection, the deterioration of an object’s appearance in a single frame brings challenges for recognition; therefore, it is natural to exploit temporal information to boost the robustness of video object detection. Existing methods usually utilize temporal information to enhance features, often ignoring the information in label assignments. Label assignment, which assigns labels to anchors for training, is an essential part of object detection. It is also challenged in video object detection and can be improved by temporal information. In this work, a temporal-guided label assignment framework is proposed for the learning task of a region proposal network (RPN). Specifically, we propose a feature instructing module (FIM) to establish the relation model among labels through feature similarity in the temporal dimension. The proposed video object detection framework was evaluated on the ImageNet VID benchmark. Without any additional inference cost, our work obtained a 0.8 mean average precision (mAP(%)) improvement over the baseline and achieved a mAP(%) of 82.0. The result was on par with the state-of-the-art accuracy without using any post-processing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call