Abstract

Maintaining spatial and temporal consistency in the inpainted video area of the video is a challenging problem. Recent research focuses on flow information for synthesizing temporally smooth pixels while neglecting semantic structural coherence across the video frames. Thus, it suffers from over-smoothing and shadowy outlines that significantly degrade the inpainted video quality. We propose an end-to-end consistent video inpainting model that will substantially improve the inpainted video region to overcome this problem. The model employs a deep encoder (DE), axial attention block (AAB), style transformer, and decoder to enhance video inpainting with a realistic structure. A deep encoder (DE) encodes features effectively while the axial attention block (AAB) recreates all retrieved attributes by merging recoverable multi-scale characteristics with local spatial structures. Then, a novel-style transformer with the style manipulation block (SMB) fills the missing area with rich visual elements and temporal coherence. We use two publicly available benchmark datasets to assess the model's performance. Experimental results demonstrate that our method performs better than the state-of-the-art methods by a large margin. Besides, an extensive ablation study validates the model's performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.