Abstract

Moving object detection under video surveillance systems is a critical task for many computer vision applications. That being said, extracting object from real-world surveillance video is still a challenging task, since various appearances and shapes, and light condition are all unsolved problems. Exploiting contexts from adjacent frames is believed to be valuable for tackling these challenges in surveillance video and is far from development. In this paper, we introduce a novel one-stage approach, named SDNN, to detect objects with multiple successive frames in videos. Specifically, the network fuses the context information of multiple frames, and combines predictions from multi-scale feature maps at different layers. The multi-frame feature fusion scheme enable the training process follows an end-to-end fashion. Experimental results conducted on surveillance video dataset show that the proposed SDNN achieved state-of-the-art results. The source code is available at https://github.com/jmuyjl/SDNN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call