With the development of video technology, a large amount of video data generated from video conferences, sports events, live broadcasts and network classes flows into our daily lives. However, ultra-high-definition video transmission is still a challenge due to the limited network bandwidth and instability, which further affects the quality of video service closely linked with consumer electronic video display. To address this challenge, we propose a deep-learned perceptual quality control approach, which can significantly improve the video quality and visual experience at the same bandwidth. The proposed scheme mainly involves saliency region extraction, perceptual-based bits allocation, and video enhancement. Firstly, we exploit a multi-scale deep convolutional network module to predict the static saliency map that semantically highlights the salient regions. Secondly, we develop a recurrent neural network model to extract the dynamic saliency regions. Finally, a three-level rate allocation scheme is developed based on the resulted saliency guidance, which is more reasonable by taking into account the visual characteristics of human eyes. Experimental results on a large dataset show that our method achieves an average gain of 1.5dB on the salient regions without introducing an extra bandwidth burden, which significantly improves the visual experience and paves the way to intelligent video communication.
Read full abstract