Abstract

Recently, object detection in static images has achieved great success thanks to the application of deep convolutional neural networks. Nevertheless, extending object detectors from image to video is not trivial. Video object detection often suffers from deteriorated object appearances, such as motion blur, video defocus, rare poses or partial occlusion. This paper presents a novel end-to-end Feature Aligned Recurrent Network (FARN) for video object detection with the following two key modules. The first one is a motion guided feature alignment module, which aligns feature maps of past frames with that of the current frame using optical flow fields derived by a subnetwork. The second one is a spatial aware recurrent module, which consists of an attention unit to adaptively weight feature maps from different frames according to learned image quality, and a recurrent unit to leverage spatio-temporal coherence. FARN works causally in the sense that no future frame is used for object detection in the current frame. As evaluated on the ImageNet VID dataset, FARN demonstrated promising performance for causal video object detection compared to state-of-the-art non-causal detectors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call