Abstract
With the constantly growing popularity of video-based services and applications, no-reference video quality assessment (NR-VQA) has become a very hot research topic. Over the years, many different approaches have been introduced in the literature to evaluate the perceptual quality of digital videos. Due to the advent of large benchmark video quality assessment databases, deep learning has attracted a significant amount of attention in this field in recent years. This paper presents a novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks (CNN) to characterize versatitely the potential image and video distortions. Specifically, temporally pooled and saliency weighted video-level deep features are extracted with the help of a set of pre-trained CNNs and mapped onto perceptual quality scores independently from each other. Finally, the quality scores coming from the different regressors are fused together to obtain the perceptual quality of a given video sequence. Extensive experiments demonstrate that the proposed method sets a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions. Moreover, the presented results underline that the decision fusion of multiple deep architectures can significantly benefit NR-VQA.
Highlights
Measuring the quality of digital videos has been a hot and important research topic in the literature
We demonstrate that the decision fusion of multiple deep architectures significantly improves the performance of noreference video quality assessment (NR-video quality assessment (VQA))
Gaussian process regressors (GPR) with rational quadratic kernel functions, saliency weighted global average pooling (SWGAP) layers, and arithmetic average as decision fusion were applied in the proposed method which is code-named as SWDF-DF-VQA in the followings
Summary
Measuring the quality of digital videos has been a hot and important research topic in the literature. Each process affects the video in a certain way, and in most cases it will introduce some type of artifact or noise. These artifacts, which can be blur, geometric distortions, or blockiness artifacts from compression standards, degrade the perceptual quality of the digital video. Subjective VQA provides benchmark databases [5–7] which contain video sequences with their corresponding MOS values. These databases are extensively applied as training or testing data by different objective VQA methods which aim to construct mathematical models for accurately estimating the perceptual quality of video sequences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.