Abstract

With the constantly growing popularity of video-based services and applications, no-reference video quality assessment (NR-VQA) has become a very hot research topic. Over the years, many different approaches have been introduced in the literature to evaluate the perceptual quality of digital videos. Due to the advent of large benchmark video quality assessment databases, deep learning has attracted a significant amount of attention in this field in recent years. This paper presents a novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks (CNN) to characterize versatitely the potential image and video distortions. Specifically, temporally pooled and saliency weighted video-level deep features are extracted with the help of a set of pre-trained CNNs and mapped onto perceptual quality scores independently from each other. Finally, the quality scores coming from the different regressors are fused together to obtain the perceptual quality of a given video sequence. Extensive experiments demonstrate that the proposed method sets a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions. Moreover, the presented results underline that the decision fusion of multiple deep architectures can significantly benefit NR-VQA.

Highlights

  • Measuring the quality of digital videos has been a hot and important research topic in the literature

  • We demonstrate that the decision fusion of multiple deep architectures significantly improves the performance of noreference video quality assessment (NR-video quality assessment (VQA))

  • Gaussian process regressors (GPR) with rational quadratic kernel functions, saliency weighted global average pooling (SWGAP) layers, and arithmetic average as decision fusion were applied in the proposed method which is code-named as SWDF-DF-VQA in the followings

Read more

Summary

Introduction

Measuring the quality of digital videos has been a hot and important research topic in the literature. Each process affects the video in a certain way, and in most cases it will introduce some type of artifact or noise. These artifacts, which can be blur, geometric distortions, or blockiness artifacts from compression standards, degrade the perceptual quality of the digital video. Subjective VQA provides benchmark databases [5–7] which contain video sequences with their corresponding MOS values. These databases are extensively applied as training or testing data by different objective VQA methods which aim to construct mathematical models for accurately estimating the perceptual quality of video sequences

Objective
Literature Review
Frame-Level Feature Extraction
Video-Level Feature Extraction
Applied Benchmark VQA Databases
Evaluation Protocol
Results
Ablation Study
Comparison to the State-of-the-Art
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.