Preserving Global and Local Temporal Consistency for Arbitrary Video Style Transfer

Xinxiao Wu,Jialu Chen

doi:10.1145/3394171.3413872

Abstract

Video style transfer is a challenging task that requires not only stylizing video frames but also preserving temporal consistency among them. Many existing methods resort to optical flow for maintaining the temporal consistency in stylized videos. However, optical flow is sensitive to occlusions and rapid motions, and its training processing speed is quite slow, which makes it less practical in real-world applications. In this paper, we propose a novel fast method that explores both global and local temporal consistency for video style transfer without estimating optical flow. To preserve the temporal consistency of the entire video (i.e., global consistency), we use structural similarity index instead of flow optical and propose a self-similarity loss to ensure the temporal structure similarity between the stylized video and the source video. Furthermore, to enhance the coherence between adjacent frames (i.e., local consistency), a self-attention mechanism is designed to attend the previous stylized frame for synthesizing the current frame. Extensive experiments demonstrate that our method generally achieves better visual results and runs faster than the state-of-the-art methods, which validates the superiority of simultaneously preserving global and local temporal consistency for video style transfer

Full Text