Abstract
In this work, we address the problem of full-reference video quality prediction. To address this problem, we rely on deep learning based spatio-temporal representations of natural videos. Specifically, we use feature representations derived from a per-voxel deep learning regression model. This model predicts the functional Magnetic Resonance Imaging (fMRI) responses of the visual cortical regions to natural video stimuli. We construct a rudimentary full-reference spatio-temporal quality feature that is simply the L1-norm of the error between the voxel model’s response to the reference and test video stimuli. This feature is shown to correlate well with subjective quality scores. Additionally, we rely on the Multi-Scale Structural Similarity (MS-SSIM) index as the spatial quality feature. We show that the combination of the proposed spatio-temporal feature and the spatial (MS-SSIM) feature delivers competitive performance for both Quality of Experience (QoE) prediction and Video Quality Assessment (VQA) tasks. This finding not only provides corroborative evidence to previous results based on electroencephalograph (EEG) signals on the role of the visual cortex in quality prediction but also opens up interesting directions for perceptually inspired design of objective video quality metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.