Conventional deep neural networks based video object segmentation (VOS) methods are dominated by heavily fine-tuning a segmentation model on the first frame of a given video, which is time-consuming and inefficient. In this paper, we propose a novel method which rapidly adapts a base segmentation model to new video sequences with only a couple of model-update iterations, without sacrificing performance. Such attractive efficiency benefits from the meta-learning paradigm which leads to a meta-segmentation model and a novel continuous learning approach which enables online adaptation of the segmentation model. Concretely, we train a meta-learner on multiple VOS tasks such that the meta model can capture their common knowledge and gains the ability to fast adapt the segmentation model to new video sequences. Furthermore, to deal with unique challenges of VOS tasks from temporal variations in the video, e.g., object motion and appearance changes, we propose a principled online adaptation approach that continuously adapts the segmentation model across video frames by exploiting temporal context effectively, providing robustness to annoying temporal variations. Integrating the meta-learner with the online adaptation approach, the proposed VOS model achieves competitive performance against the state-of-the-arts and moreover provides faster per-frame processing speed.
Read full abstract