A Fully Convolutional Encoder–Decoder Spatial–Temporal Network for Real-Time Background Subtraction

Mingkai Qiu,Xiying Li

doi:10.1109/access.2019.2925913

Abstract

Background subtraction is described as the task of distinguishing pixels into moving objects and the background in a frame. In this paper, we propose a fully convolutional encoder–decoder spatial–temporal network (FCESNet) to achieve real-time background subtraction. In the proposed many-to-many architecture method encoded features of consecutive frames are fed into a spatial–temporal information transmission (STIT) module to capture the spatial–temporal correlation in the frame sequence, and then a decoder is designed to output the subtraction results of all frames. A “patch-based” training method is designed to increase the practicability and flexibility of the proposed method. The experiments over CDNet2014 have shown that the proposed method could achieve state-of-the-art performance. The proposed method is proved to be able to achieve real-time background subtraction.