Abstract

Foreground segmentation in video frames is quite valuable for object and activity recognition, while the existing approaches often demand training data or initial annotation, which is expensive and inconvenient. We propose an automatic and unsupervised method of foreground segmentation given an unlabeled and short video. The pixel-level optical flow and binary mask features are converted into the normal probabilistic superpixels, therefore, they are adaptable to build the superpixel-level conditional random field which aims to label the foreground and background. We exploit the fact that the appearance and motion features of the moving object are temporally and spatially coherent in general, to construct an object-like pool and background-like pool via the previous segmented results. The continuously updated pools can be regarded as the “prior” knowledge of the current frame to provide a reliable way to learn the features of the object. Experimental results demonstrate that our approach exceeds the current methods, both qualitatively and quantitatively.

Highlights

  • Video foreground segmentation plays a prerequisite role in a variety of visual applications such as safety surveillance[1] and intelligent transportation.[2]

  • In order to improve the performance of unsupervised and short video segmentation, we proposed an online unsupervised learning approach inspired by Ref. 9

  • This paper aims to segment the moving foreground from the unlabeled and short video in an unsupervised way without prior knowledge

Read more

Summary

Introduction

Video foreground segmentation plays a prerequisite role in a variety of visual applications such as safety surveillance[1] and intelligent transportation.[2] The existing algorithms usually use supervised or semisupervised methods and achieve satisfying results. The performances are still limited when they are applied for unsupervised and short videos, because the supervised methods usually demand many training examples that are expensive to manually label. Some semisupervised methods require accurate object region annotation only for the first frame, they exploit the region-tracking methods to segment the rest of the frames. Many visual applications like safety surveillance demand intelligent and unattended operations, which make the initial annotation impractical. The available video frames may be insufficient sometimes since the objects can move rapidly into and out of the visual field when they are near the camera

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call