Abstract

Effectively utilizing the common information in a set of video frames is a vital aspect in video segmentation. However, existing methods that transport the common information from a prior frame to the current frame do not make use of the common information effectively. In order to address this issue, we apply a new strategy that jointly segments object through a convolutional neural network (CNN) to build a proposal-driven framework for exploiting the common information between two video frames by processing two video frames simultaneously in this letter. Moreover, proposals from the video frames are found useful for refining the segmentation results through fusing their segmentation results with the ones of the video frames. In our framework, proposals with features are generated by a faster region-CNN, and the L2 loss function is used to establish proposal pairs among proposals from the two selected frames. A new trained ResNet then keeps proposal pairs, which contain the same content, and the PSPNet model for segmentation is utilized to generate the segmentation results belonging to the frames and proposals. Finally, the proposals’ segmentation results are refined using the video frames’ segmentation results. The VOT 2016 segmentation dataset, the DAVIS 2017 dataset, and the SegTrack v2 dataset were used for training and testing our framework. Experimental results show that our proposal-driven segmentation framework is able to achieve higher accuracies in video segmentation challenge compared to those of the existing video segmentation methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call