Abstract

Video object segmentation of real-world scenes is a challenging task due to the dynamic environment change such as uneven illumination, object articulation, as well as camera motion. In this paper, we proposed a method to combine semantic regions by trained deep convolutional neural networks with saliency map and motion cues to segment foreground objects in video sequences. Firstly, the parameters of the deep convolutional networks are trained using PASCAL VOC dataset with human annotations. The training process consists of forward inference and background learning stages. The learning process employs the standard stochastic gradient descending algorithm. The number of epochs during training process is fixed at 7. Secondly, the trained convolutional networks are used to predict the semantic labels of a real-world video sequence at per-frame level. The inferenced semantic region is combined with the saliency map through Markov random field to derive foreground objects at each frame of the video. To access the segmentation performance of the proposed algorithm, we test the proposed algorithm using a video sequence from FBMS motion segmentation benchmarks and compare the segmentation accuracy with state-of-the-art video object segmentation algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call