Fast Video Object Segmentation via Dynamic YOLACT

Tianfang Meng,Wenqiang Zhang

doi:10.1109/icassp43922.2022.9747260

Abstract

Video Object Segmentation (VOS) is a fundamental task in video recognition with many practical applications. It aims at predicting segmentation masks of multiple objects in an entire video. Recent video object segmentation(VOS) researches have achieved remarkable performance. However, as a video processing task, the inference speed of the VOS method is also essential. VOS can be considered an extension of semantic segmentation from a static image to a dynamic image sequence. Following this idea, we propose a fast VOS framework based on YOLACT, a real-time static image segmentation framework. We employ a fast online training technique to make YOLACT grow wings to handle dynamic video sequences and achieve competitive performance(77.2 J&F and 30.9 FPS on DAVIS17) among fast VOS methods. Moreover, by linearly combining mask bases to generate masks for arbitrary objects, our method can process multi-object videos with minimal extra computations.

Full Text