Abstract

The prevalence of short-video applications imposes more requirements for video quality assessment (VQA). User-generated content (UGC) videos are captured under an unprofessional environment, thus suffering from various dynamic degradations, such as camera shaking. To cover the dynamic degradations, existing recurrent neural network-based UGC-VQA methods can only provide implicit modeling, which is unclear and difficult to analyze. In this work, we consider explicit motion representation for dynamic degradations, and propose a motion-enhanced UGC-VQA method based on decomposition and recomposition. In the decomposition stage, a dual-stream decomposition module is built, and VQA task is decomposed into single frame-based quality assessment problem and cross frames-based motion understanding. The dual streams are well grounded on the two-pathway visual system during perception, and require no extra UGC data due to knowledge transfer. Hierarchical features from shallow to deep layers are gathered to narrow the gaps from tasks and domains. In the recomposition stage, a progressively residual aggregation module is built to recompose features from the dual streams. Representations with different layers and pathways are interacted and aggregated in a progressive and residual manner, which keeps a good trade-off between representation deficiency and redundancy. Extensive experiments on UGC-VQA databases verify that our method achieves the state-of-the-art performance and keeps a good capability of generalization. The source code will be available in <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Sissuire/DSD-PRO</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call