Abstract

In background subtraction tasks, spatial and temporal contexts are beneficial in detecting moving objects. The methods based on Deep Neural Networks in this task has explored different topologies, which are composed of the conventional operations of convolutional neural networks, such as Convolutional Long-short Term Memory layer (ConvLSTM), 2D convolutional layer, or 3D convolutional layer, to capture these contexts. In this work, we propose a new background subtraction algorithm named spatial–temporal propagation network. An end-to-end network with novel layers, whose process of operation is equivalent to that the feature maps multiply with affinity matrices, is proposed to capture the spatial–temporal correlation in video sequences and aggregate the deep features from the consecutive frames. Experimental results on CDnet-2014 and LASIESTA datasets show that this novel layer provides an alternative way for our network to aggregate multiscale spatial–temporal features. Meanwhile, the proposed network achieves state-of-the-art performance and is generalizable to unseen videos.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call