Abstract

The problem of semi-supervised video object segmentation has been popularly tackled by fine-tuning a general-purpose segmentation deep network on the annotated frame using hundreds of iterations of gradient descent. The time-consuming fine-tuning process, however, makes these methods difficult to use in practical applications. We propose a novel architecture called Annotation Guided U-net (AGUnet) for fast one-shot video object segmentation (VOS). AGUnet can quickly adapt a model trained on static images to segmenting the given target in a video by only several iterations of gradient descent. Our AGUnet is inspired by interactive image segmentation, where the interested target is segmented by using user annotated foreground. However, in AGUnet we use a fully-convolutional Siamese network to automatically annotate the foreground and background regions and fuse such annotation information into the skip connection of a U-net for VOS. Our AGUnet can be trained end-to-end effectively on static images instead of video sequences as required by many previous methods. The experiments show that AGUnet runs much faster than current state-of-the-art one-shot VOS algorithms while achieving competitive accuracy, and it has high generalization capability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call