Abstract

Video object detection is of great significance for video analysis. In contrast to object detection in still image, video object detection is more challenging which suffers from motion blur, varying view-points/pose, and occlusion. Existing methods utilized temporal information during detection in videos and show improvement over static-image detector. In this paper, we propose a novel method for video object detection that can aggregate features across adjacent frames adaptively as well as capture more global cues so as to be more robust to drastic appearance changes. Initially, current frame feature and warped feature from adjacent frames can be obtained via feature extraction network and optical flow network. Next, a coherence contribution module is designed for adaptively feature aggregation of the two kinds of features obtained from the first step. Finally, the still-image detector which included an extra instance-level module that agregates features from adjacent frames for capturing more global feature is adopted to get the final result. The experimental results evaluated on our method shows leading performance on the ImageNet VID dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call