Integrating instance-level knowledge to see the unseen: A two-stream network for video object segmentation

Hannan Lu,Zhi Tian,Pengxu Wei,Haibing Ren,Wangmeng Zuo

doi:10.1016/j.neucom.2024.127878

Abstract

Existing matching-based video object segmentation (VOS) approaches carry inherent limitations in segmenting pixels that have never appeared in the previous frames (i.e., unseen pixels). In this paper, we introduce a Two-Stream Network (TSN), which addresses this issue by distinguishing between seen and unseen pixels softly and processes them with two streams. Particularly, a pixel division module is devised to generate a routing map, distinguishing between seen and unseen pixels. Guided by the routing map, TSN integrates instance-level knowledge from an instance stream and pixel-level information from a pixel stream explicitly, generating the final segmentation result. The soft partitioning strategy allows for flexibility and adaptability in the fusion process. Additionally, the compact instance stream encodes and leverages instance-level knowledge, resulting in improved segmentation accuracy of the unseen pixels. Extensive experiments demonstrate the effectiveness of our proposed TSN, and we also report state-of-the-art performance on public VOS benchmarks.

Full Text