Abstract

In this paper, we propose a deep neural network-based no-reference (NR) video quality assessment (VQA) method with spatiotemporal feature fusion and hierarchical information integration to evaluate the perceptual quality of videos. First, a feature extraction model is proposed by using 2D and 3D convolutional layers to gradually extract spatiotemporal features from raw video clips. Second, we design a hierarchical branching network to fuse multiframe features, and the feature vectors at each hierarchical level are comprehensively considered during the process of network optimization. Finally, these two modules and quality regression are synthesized into an end-to-end architecture. Experimental results obtained on benchmark VQA databases demonstrate the superiority of our method over other state-of-the-art algorithms. The source code is available online. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><xref ref-type="fn" rid="fn1">1</xref></sup>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.