Abstract

The foreground segmentation method is a crucial first step for many video analysis methods such as action recognition and object tracking. In the past five years, convolutional neural network based foreground segmentation methods have made a great breakthrough. However, most of them pay more attention to stationary cameras and have constrained performance on the pan–tilt–zoom (PTZ) cameras. In this paper, an end-to-end deep features homography transformation and fusion network based foreground segmentation method (HTFnetSeg) is proposed for surveillance videos recorded by PTZ cameras. In the kernel of HTFnetSeg, there is the combination of an unsupervised semantic attention homography estimation network (SAHnet) for frames alignment and a spatial transformed deep features fusion network (STDFFnet) for segmentation. The semantic attention mask in SAHnet reinforces the network to focus on background alignment by reducing the noise that comes from the foreground. STDFFnet is designed to reuse the deep features extracted during the semantic attention mask generation step by aligning the features rather than only the frames, with a spatial transformation technique in order to reduce the algorithm complexity. Additionally, a conservative strategy is proposed for the motion map based post-processing step to further reduce the false positives that are brought by semantic noise. The experiments on both CDnet2014 and Lasiesta show that our method outperforms many state-of-the-art methods, quantitively and qualitatively.

Highlights

  • Foreground segmentation is an activate research topic in computer vision [1], as it is a stepping stone for video surveillance and many video analysis methods by extracting useful information from videos

  • FgSegNet [12], the top convolutional neural network (CNN)-based foreground segmentation method on CDnet2014 with source code opened to the public, and image-completion network (ICNET)-CDNET [28], the deep change detection algorithm based on the collaboration of an image completion network and a change detection network, on the dataset mentioned above

  • We propose an end-to-end foreground segmentation neural network HTFnetSeg for PTZ cameras, where the homography estimation network is combined with the deep feature fusion network to extend the application of the DFFnetSeg to a wider camera motion situation by conquering its weakness on the continuous moving camera situation

Read more

Summary

Introduction

Foreground segmentation is an activate research topic in computer vision [1], as it is a stepping stone for video surveillance and many video analysis methods by extracting useful information from videos. The foreground segmentation methods are designed under the assumption that cameras are stationary. The multi-cameras systems are adopted for video surveillance and object tracking tasks to overcome this limitation, because it can cover different angles and shots. Despite these advantages, the multi-camera brought challenge issues, such as installation cost, multi-camera collaboration, and object re-identity.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call