Abstract

We present a system for the removal of objects from videos. As input, the system only needs a user to draw a few strokes on the first frame, roughly delimiting the objects to be removed. To the best of our knowledge, this is the first system allowing the semi-automatic removal of objects from videos with complex backgrounds. The key steps of our system are the following: after initialization, segmentation masks are first refined and then automatically propagated through the video. Missing regions are then synthesized using video inpainting techniques. Our system can deal with multiple, possibly crossing objects, with complex motions, and with dynamic textures. This results in a computational tool that can alleviate tedious manual operations for editing high-quality videos.

Highlights

  • In this paper, we propose a system to remove one or more objects from a video, starting with only a few user annotations

  • We evaluate our method on various datasets, for both object segmentation and object removal

  • The process of extracting space–time segments corresponding to objects, is a widely studied topic whose complete review is beyond the scope of this paper

Read more

Summary

Introduction

We propose a system to remove one or more objects from a video, starting with only a few user annotations. We use a classical strategy relying on a CNN-based edge detector, followed by a watershed transform yielding super-pixels, which are eventually selected by the user to refine the segmentation mask After this step, a label is given to each object. We employ two strategies: motion-based pixel propagation for the static background, and patch-based video completion for dynamic textures Both methods rely heavily on the knowledge of segmented objects. This interplay between object segmentation and the completion scheme improves the method in many ways: it allows for better video stabilization, for faster and more accurate search for similar patches, and for more accurate foreground–background separation These improvements yield completion results with very little or no temporal incoherence. A shorter version of this work can be found in Ref. [2]

Video object segmentation
Video editing
Video inpainting
Proposed method
First frame annotation
Object segmentation
Semantic segmentation networks
Multiple object tracking
Object removal
Dynamic background
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call