Abstract

With the development of video capture devices and network technology, recent years have witnessed an exponential increase of user-generated content (UGC) videos in various sharing platforms. Comparing to professionally generated content (PGC) videos, these UGC videos are generally captured by amateurs using smartphone cameras in various life scenes and contains various in-capture distortions. Besides, these videos undergo multi stages that may affect the perceptual quality before finally viewed by end-users. Complex and diverse distortion types bring difficulties to objective quality assessment. In this paper, we present a data-driven video quality assessment (VQA) method for UGC videos based on a convolutional neural network (CNN) and a Transformer. Specifically, the CNN backbone is used to extract features from frames and the output is fed to the Transformer encoder for the prediction of quality score. The proposed method can be used for both full-reference (FR) and no-reference (NR) VQA with slight adaptations. Our method ranks first and second in MOS track and DMOS track of the challenge on quality assessment of compressed UGC videos [1], respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call