Missing Recover with Recurrent Neural Networks for Video Object Detection

Ranran Shen,Wenzhong Wang,Shaojie Zhang,Jin Tang

doi:10.1007/978-981-13-2922-7_19

Abstract

Despite recent breakthroughs in object detection with static images, extending state-of-the-art object detectors from image to video is challenging. The detection accuracy suffers from degenerated object appearances in videos, e.g., occlusion, video defocus, motion blur, etc. In this paper, we present a new framework called Missing Recover Recurrent Neural Networks (MR-RNN) for improving object detection in videos, which captures temporal information to recover the missing object. First, We detect objects in consecutive frames to obtain the bounding boxes and their confidence scores. The detector is trained for every frame of the video. Then we feed these detections into a Recurrent Neural Network (LSTM [8] or BiLSTM [4]) to capture temporal information. This method is tested on a large-scale vehicle dataset, “DETRAC”. Our approach achieves Average Precision (AP) of 68.90 based on SSD detector, an improvement of 2.68 over the SSD detector. Experimental results show that our method successfully detects many objects which are missed by basic detectors.

Full Text