Abstract

To achieve accurate segmentation in unconstrained videos, we propose a novel segmentation framework based on a two-stream deep convolution network. Our algorithm exploits the object’s robust pixel-level features within all the video frames and generates foreground likelihood maps with sufficient details. At first, a two-stream video segmentation network using multiple hierarchical features is designed to generate initial segmentation masks. Then, all initial segmentation masks and original corresponding images are collected to learn an appearance model by the least-square regression method. The model computes appearance likelihood map for every image. Finally, pairs of initial segmentation masks and appearance likelihood maps are fused by a proposed fusion network to generate final high-quality segmentation maps. Experiments on the challenging dataset DAVIS verify the effectiveness of our appearance regression and demonstrate that our proposed algorithm outperforms the state-of-the-art algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call