Abstract

Video object segmentation (VOS) aims to separate unknown target objects from various given video sequences. Although many recent successful methods boosted the performance of VOS, especially those using deep convolution neural networks (CNNs), it is still difficult to aggregate deep features as well as motion cues effectively, which can be important to associate valid information of adjacent frames in video sequences. To tackle this problem, we propose a simple yet effective feature optimization method for VOS based on motion information. To achieve this, we construct a two-branch deep network and use computed motion cues (i.e., optical flow) to jointly optimize global and local interframe correlation information. Additionally, a clustering-based feature enhancement module is proposed to further fuse motion information and enhance the feature saliency of the target area. Optimized feature maps show a significant performance improvement in the final VOS tasks, especially those with rapid target movement. Experiments on the DAVIS16, DAVIS17, YouTube-Objects and YouTube-VOS datasets demonstrate that our simple feature aggregation and enhancement method for VOS improves segmentation accuracy effectively and gains an impressive result compared to many state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call