No-reference video quality assessment based on spatiotemporal slice images and deep convolutional neural networks

Peng Yan,Xuanqin Mou,Zhenrong Zheng,Tsutomu Shimura,Qionghai Dai

doi:10.1117/12.2536866

Abstract

Most learning-based no-reference (NR) video quality assessment (VQA) needs to be trained with a lot of subjective quality scores. However, it is currently difficult to obtain a large volume of subjective scores for videos. Inspired by the success of full-reference VQA methods based on the spatiotemporal slice (STS) images in the extraction of perceptual features and evaluation of video quality, this paper adopts multi-directional video STS images, which are images composed of multi-directional sections of video data, to deal with the lacking of subjective quality scores. By sampling the STS images of video into image patches and adding noise to the quality labels of patches, a successful NR VQA model based on multi-directional STS images and neural network training is proposed. Specifically, first, we select the subjective database that currently contains the largest number of real distortion videos as the test set. Second, we perform multi-directional STS extraction on the videos and sample the local patches from the multi -directional STS to augment the training sample set. Besides, we add some noise to the quality label of the local patches. Third, a reasonable deep neural network is constructed and trained to obtain a local quality prediction model for each patch in the STS image, and then the quality of an entire video is obtained by averaging the model prediction results of multi -directional STS images. Finally, the experiment results indicate that the proposed method tackles the insufficiency of training samples in small subjective VQA dataset and obtains a high correlation with the subjective evaluation.

Full Text