Abstract
Panoramic video and stereoscopic panoramic video are essential carriers of virtual reality content, so it is very crucial to establish their quality assessment models for the standardization of virtual reality industry. However, it is very challenging to evaluate the quality of the panoramic video at present. One reason is that the spatial information of the panoramic video is warped due to the projection process, and the conventional video quality assessment (VQA) method is difficult to deal with this problem. Another reason is that the traditional VQA method is problematic to capture the complex global time information in the panoramic video. In response to the above questions, this paper presents an end-to-end neural network model to evaluate the quality of panoramic video and stereoscopic panoramic video. Compared to other panoramic video quality assessment methods, our proposed method combines spherical convolutional neural networks (CNN) and non-local neural networks, which can effectively extract complex spatiotemporal information of the panoramic video. We evaluate the method in two databases, VRQ-TJU and VR-VQA48. Experiments show the effectiveness of different modules in our method, and our method outperforms state-of-the-art other related methods.
Highlights
A S a new means of simulation and interaction, virtual reality (VR) has attracted more and more attention in recent years [1]
As a representative method of this idea, weighted-tospherically-Uniform PSNR (WS-PSNR) [21] is calculated according to the following formula: In order to resolve the contradiction between convolutional neural networks (CNN) and global time domain information, non-local neural networks are integrated into our proposed framework
We propose a method based on deep learning, which can evaluate the quality of panoramic video and stereo panoramic video end-to-end
Summary
A S a new means of simulation and interaction, virtual reality (VR) has attracted more and more attention in recent years [1]. Yu et al [22] projected the pixels on the original panoramic video plane and the distorted panoramic video plane onto a sphere, and performed a large number of uniform sampling on the spherical surface to calculate the PSNR. They proposed two indicators, S-PSNR and L-PSNR, which differ in whether they give higher weight to the equator. Non-local neural networks module [25] makes the feature map in the neural network contain attention information, so the global time information of the panoramic video can be extracted together with the spherical CNN. We elaborate on the characteristics of panoramic video and related works (Section II), analyze our methods (Section III), evaluate our methods through a large number of experiments (Section IV), draw conclusions and discuss the future direction (Section V)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have