Spatiotemporal Generative Adversarial Network-Based Dynamic Texture Synthesis for Surveillance Video Coding

Kun Yang,Zhibo Chen,Feng Wu,Weiping Li,Dong Liu

doi:10.1109/tcsvt.2021.3061153

Abstract

Dynamic texture refers to the content in video sequences that is characterized by spatial repetition and temporal variation, such as swaying foliage and flowing water. It is a great challenge to compress the dynamic textures efficiently in the current prediction/transform hybrid video coding framework. However, these textures have little information for machine vision, and human visual perception is less sensitive to the textures than to the structures. Thus, we propose a spatiotemporal generative adversarial network (GAN) based dynamic texture synthesis method for surveillance video coding. We detect and remove the dynamic texture content at encoder side, which is irrelevant to machine vision. We generate the dynamic texture content using the proposed GAN at decoder side, so that the reconstructed videos can be observed by human without deteriorating perceptual quality. Specifically, we design a GAN network to synthesize dynamic textures by exploiting the correlation between spatial and temporal neighbors; we present a surveillance video coding scheme with the dynamic texture detection/synthesis method; we build a high-quality dynamic texture dataset, and we collect a dynamic texture testing dataset that goes beyond the existing video coding test datasets by focusing on surveillance scenes. The proposed video coding scheme has been implemented on top of the High Efficiency Video Coding (HEVC) reference software. Experiments have been conducted to evaluate the quantitative and qualitative performance of the proposed coding scheme. Our method achieves 7.4% and 7.6% bit-rate savings in low-delay-B and low-delay-P settings, respectively, at similar visual quality levels in comparison with HEVC.

Full Text