Abstract

Video object detection enhances the performance of the still-image based object detection by exploiting temporal context information from neighboring frames. Most of the state-of-the-art video object detectors are non-causal and require lots of preceding and succeeding frames, which makes them impractical for real-time online detection where succeeding frames are not available. In this paper, we propose a causal recurrent flow-based method for online video object detection. The proposed method reads only the current frame and one preceding frame from memory buffer at each time step. Two types of temporal context information are utilized. The short-term temporal context information is utilized by warping the feature map from the nearby preceding frame based on the optical flow. The long-term temporal context information is extracted from the temporal convolutional LSTM, where informative features from distant preceding frames are stored and propagated through the time. By aggregating both long- and short-term temporal context information, our proposed method achieves competitive performance (75.5% mAP) on ImageNet VID dataset at a relatively high speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call