Abstract

This paper proposes an end-to-end deep learning framework, termed as motion-aid feature calibration network (MFCN), for video object detection. The key idea is to leverage on the temporal coherence of video features while considering their motion patterns as captured by optical flow. To boost detection accuracy, the framework aggregates the calibrated features both at pixel and instance levels across frames to achieve improved robustness despite appearance variations. The aggregation and calibration are efficiently and adaptively conducted based on an integrated optical flow network. Meanwhile, the entire architecture of the proposed method is end-to-end, thus significantly improving its training and inference efficiency when compared to multi-stage methods for video object detection. Evaluations on KITTI and ImageNet VID indicate that MFCN can improve the results of a strong still-image detector by 11.2% and 7.31% respectively. MFCN also outperforms other competitive video object detectors and achieves a better trade-off between accuracy and runtime speed, demonstrating its potential for use in autonomous driving systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.