Abstract

Blind or no-reference video quality assessment of user-generated content (UGC) has become a trending, challenging, heretofore unsolved problem. Accurate and efficient video quality predictors suitable for this content are thus in great demand to achieve more intelligent analysis and processing of UGC videos. Previous studies have shown that natural scene statistics and deep learning features are both sufficient to capture spatial distortions, which contribute to a significant aspect of UGC video quality issues. However, these models are either incapable or inefficient for predicting the quality of complex and diverse UGC videos in practical applications. Here we introduce an effective and efficient video quality model for UGC content, which we dub the Rapid and Accurate Video Quality Evaluator (RAPIQUE), which we show performs comparably to state-of-the-art (SOTA) models but with orders-of-magnitude faster runtime. RAPIQUE combines and leverages the advantages of both quality-aware scene statistics features and semantics-aware deep convolutional features, allowing us to design the first general and efficient spatial and temporal (space-time) bandpass statistics model for video quality modeling. Our experimental results on recent large-scale UGC video quality databases show that RAPIQUE delivers top performances on all the datasets at a considerably lower computational expense. We hope this work promotes and inspires further efforts towards practical modeling of video quality problems for potential real-time and low-latency applications. To promote public usage, an implementation of RAPIQUE has been made freely available online: \url{https://github.com/vztu/RAPIQUE}.

Highlights

  • Recent years have witnessed an explosion of user-generated content (UGC) captured and streamed over social media platforms such as YouTube, Facebook, TikTok, and Twitter

  • We have found that the mean subtraction and contrast normalization (MSCN) coefficients of the temporal bandpass coefficients of natural videos exhibit a Gaussianlike appearance, as shown in Fig. 8, while the regularities are modified by the presence of distortion, strongly suggesting the possibility of quantifying deviations to predict perceived video quality

  • Note that Pearson Linear Correlation Coefficient (PLCC) and Root Mean Square Error (RMSE) are computed after performing a nonlinear fourparametric logistic regression to linearize the objective predictions to be on the same scale as MOS [1]: f (x) β1 − β2

Read more

Summary

Introduction

Recent years have witnessed an explosion of user-generated content (UGC) captured and streamed over social media platforms such as YouTube, Facebook, TikTok, and Twitter. UGC videos, which are typically created by amateur videographers, often suffer from unsatisfactory perceptual quality, arising from imperfect capture devices, uncertain shooting skills, and a variety of possible content processes, as well as compression and streaming distortions. In this regard, predicting UGC video quality is much more challenging than assessing the quality of synthetically distorted videos in traditional video databases. While full-reference (FR) VQA research is gradually maturing and several algorithms [2], [3] are quite widely deployed, recent attention has shifted more towards creating better no-reference (NR) VQA models that can be used to predict and monitor the quality of authentically distorted UGC videos. The decision tuning strategy of such an adaptive encoding scheme, would require the guidance of an accurate and efficient NR or blind video quality (BVQA) model suitable for UGC [6]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call