Abstract
Video quality assessment (VQA) is an important technique in video service systems. In recent years, the development of deep learning has provided further possibilities for VQA. A no-reference VQA (NR-VQA) method that combines the attention mechanism and human visual perception is proposed for in-the-wild videos. First, a deep network consisting of a convolutional neural network and attention mechanism is constructed to extract depth perception features for frame-level images, and global covariance pooling is integrated into the downsampled features to extract the second-order information of the features. Second, a Transformer network is used for temporal modeling to learn the long-term dependence of the perceptual quality prediction. Finally, a temporal weighting strategy for visual perception is used for weighted summation of the frame-level scores to obtain the final video quality scores. The results of experiments on three public user-generated content authentic distorted video databases, namely KoNViD-1k, CVD2014, and LIVE-VQC, demonstrate that the proposed method can achieve effective quality assessment in authentic distortion and outperforms other partially recent NR-VQA methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have