Abstract

Object detection generally shows promising results only using spatial information, but foreground object detection in visual surveillance requires proper use of temporal information in addition to spatial information. Recently, deep learning-based visual surveillance algorithms have shown improved results, in an environment similar to training one, compared to traditional background subtraction (BGS) algorithms. However, in unseen environments, they show poor performance compared to BGS algorithms. This paper proposes an algorithm that improves performance in unseen environments by integrating spatial and temporal information. We propose a spatio-temporal fusion network (STFN) that extracts temporal and spatial information from 3D and 2D networks. Also, we propose a method for stable training of the proposed STFN using a semi-foreground map. STFN can generate a compliant background model image and operate in real-time on a desktop with GPU. The proposed algorithm performs well in an environment different from training and is demonstrated by experiments using various public datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call