Abstract
Background subtraction is described as the task of distinguishing pixels into moving objects and the background in a frame. In this paper, we propose a fully convolutional encoder–decoder spatial–temporal network (FCESNet) to achieve real-time background subtraction. In the proposed many-to-many architecture method encoded features of consecutive frames are fed into a spatial–temporal information transmission (STIT) module to capture the spatial–temporal correlation in the frame sequence, and then a decoder is designed to output the subtraction results of all frames. A “patch-based” training method is designed to increase the practicability and flexibility of the proposed method. The experiments over CDNet2014 have shown that the proposed method could achieve state-of-the-art performance. The proposed method is proved to be able to achieve real-time background subtraction.
Highlights
Summary
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have