Abstract

Video object detection is of great significance for video analysis. In contrast to object detection in still image, video object detection is more challenging which suffers from motion blur, varying view-points/pose, and occlusion. Existing methods utilized temporal information during detection in videos and show improvement over static-image detector. In this paper, we propose a novel method for video object detection that can aggregate features across adjacent frames adaptively as well as capture more global cues so as to be more robust to drastic appearance changes. Initially, current frame feature and warped feature from adjacent frames can be obtained via feature extraction network and optical flow network. Next, a coherence contribution module is designed for adaptively feature aggregation of the two kinds of features obtained from the first step. Finally, the still-image detector which included an extra instance-level module that agregates features from adjacent frames for capturing more global feature is adopted to get the final result. The experimental results evaluated on our method shows leading performance on the ImageNet VID dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.