Abstract
Video inpainting gains an increasing amount of attention ascribed to its wide applications in intelligent video editing. However, despite tremendous progress made in RGB video inpainting, the existing RGB-D video inpainting models are still incompetent to inpaint real-world RGB-D videos, as they simply fuse color and depth via explicit feature concatenation, neglecting the natural modality gap. Moreover, current RGB-D video inpainting datasets are synthesized with homogeneous and delusive RGB-D data, which is far from real-world application and cannot provide comprehensive evaluation. To alleviate these problems and achieve real-world RGB-D video inpainting, on one hand, we propose a Mutually-guided Color and Depth Inpainting Network (MCD-Net), where color and depth are reciprocally leveraged to inpaint each other implicitly, mitigating the modality gap and fully exploiting cross-modal association for inpainting. On the other hand, we build a Video Inpainting with Depth (VID) dataset to supply diverse and authentic RGB-D video data with various object annotation masks to enable comprehensive evaluation for RGB-D video inpainting under real-world scenes. Experimental results on the DynaFill benchmark and our collected VID dataset demonstrate our MCD-Net not only yields the state-of-the-art quantitative performance but successfully achieves high-quality RGB-D video inpainting under real-world scenes. All resources are available at https://github.com/JCATCV/MCD-Net.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.