Abstract
This article first provides a new Viewport-based OmniDirectional Video Quality Assessment (VOD-VQA) database, which includes eighteen salient viewport videos extracted from the original OmniDirectional Videos (ODVs), and corresponding 774 impaired samples generated by compressing the raw viewports using a variety of combinations of its Spatial (frame size <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$s$ </tex-math></inline-formula> ), Temporal (frame rate <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula> ), and Amplitude (quantization stepsize <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula> ) Resolutions (STAR). Total 160 subjects have assessed the processed viewport videos rendered on the head mounted display (HMD) when they stabilize their fixations. We then have formulated an analytical model to connect the perceptual quality of a compressed viewport video with its STAR variables, noted as the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q^{{\mathsf {VP}}}_{\tt {STAR}}$ </tex-math></inline-formula> index. All four model parameters can be predicted using linearly weighted content features, making the proposed metric generalized to various contents. This model correlates well with the mean opinion scores (MOSs) collected for processed viewport videos, having both the Pearson Correlation Coefficient and Spearman’s Rank Correlation Coefficient (SRCC) at 0.95 according to an independent validation test, yielding the state-of-the-art performance in comparison to those popular objective metrics (e.g., Weighted to Spherically uniform (WS)-Peak Signal to Noise Ratio (PSNR), WMS-SSIM, Video Multimethod Assessment Fusion (VMAF), Feature SIMilarity Index (FSIM), and Visual Saliency based IQA Index (VSI)). Furthermore, this viewport-based quality index <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q^{{\mathsf {VP}}}_{\tt {STAR}}$ </tex-math></inline-formula> is extended to infer the overall ODV quality, a.k.a., <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q^{{\mathsf {ODV}}}_{\tt {STAR}}$ </tex-math></inline-formula> , by linearly weighing the saliency-aggregated qualities of salient viewports and the quality of quick-scanning (or non-salient) area. Experiments have shown that inferred <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q^{{\mathsf {ODV}}}_{\tt {STAR}}$ </tex-math></inline-formula> can accurately predict the MOS with competitive performance to the state-of-the-art algorithm using another four independent and third-party ODV assessment datasets. All related materials are made publicly accessible at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://vision.nju.edu.cn/20/86/c29466a467078/page.htm</uri> for reproducible research.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.