BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-Based Video Colorization.

Yixin Yang,Zhongzheng Peng,Jinhui Tang,Xiaoyu Du,Zhulin Tao,Jinshan Pan

doi:10.1109/tpami.2024.3370920

Abstract

How to effectively explore the colors of exemplars and propagate them to colorize each frame is vital for exemplar-based video colorization. In this article, we present a BiSTNet to explore colors of exemplars and utilize them to help video colorization by a bidirectional temporal feature fusion with the guidance of semantic image prior. We first establish the semantic correspondence between each frame and the exemplars in deep feature space to explore color information from exemplars. Then, we develop a simple yet effective bidirectional temporal feature fusion module to propagate the colors of exemplars into each frame and avoid inaccurate alignment. We note that there usually exist color-bleeding artifacts around the boundaries of important objects in videos. To overcome this problem, we develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process. In addition, we develop a multi-scale refinement block to progressively colorize frames in a coarse-to-fine manner. Extensive experimental results demonstrate that the proposed BiSTNet performs favorably against state-of-the-art methods on the benchmark datasets and real-world scenes. Moreover, the BiSTNet obtains one champion in NTIRE 2023 video colorization challenge (Kang et al. 2023).

Full Text