Abstract

Panoramic video and stereoscopic panoramic video are essential carriers of virtual reality content, so it is very crucial to establish their quality assessment models for the standardization of virtual reality industry. However, it is very challenging to evaluate the quality of the panoramic video at present. One reason is that the spatial information of the panoramic video is warped due to the projection process, and the conventional video quality assessment (VQA) method is difficult to deal with this problem. Another reason is that the traditional VQA method is problematic to capture the complex global time information in the panoramic video. In response to the above questions, this paper presents an end-to-end neural network model to evaluate the quality of panoramic video and stereoscopic panoramic video. Compared to other panoramic video quality assessment methods, our proposed method combines spherical convolutional neural networks (CNN) and non-local neural networks, which can effectively extract complex spatiotemporal information of the panoramic video. We evaluate the method in two databases, VRQ-TJU and VR-VQA48. Experiments show the effectiveness of different modules in our method, and our method outperforms state-of-the-art other related methods.

Highlights

  • A S a new means of simulation and interaction, virtual reality (VR) has attracted more and more attention in recent years [1]

  • As a representative method of this idea, weighted-tospherically-Uniform PSNR (WS-PSNR) [21] is calculated according to the following formula: In order to resolve the contradiction between convolutional neural networks (CNN) and global time domain information, non-local neural networks are integrated into our proposed framework

  • We propose a method based on deep learning, which can evaluate the quality of panoramic video and stereo panoramic video end-to-end

Read more

Summary

INTRODUCTION

A S a new means of simulation and interaction, virtual reality (VR) has attracted more and more attention in recent years [1]. Yu et al [22] projected the pixels on the original panoramic video plane and the distorted panoramic video plane onto a sphere, and performed a large number of uniform sampling on the spherical surface to calculate the PSNR. They proposed two indicators, S-PSNR and L-PSNR, which differ in whether they give higher weight to the equator. Non-local neural networks module [25] makes the feature map in the neural network contain attention information, so the global time information of the panoramic video can be extracted together with the spherical CNN. We elaborate on the characteristics of panoramic video and related works (Section II), analyze our methods (Section III), evaluate our methods through a large number of experiments (Section IV), draw conclusions and discuss the future direction (Section V)

Spatial Domain Characteristics of Panoramic Video
Global Time Domain Information Extraction of Panoramic Video
General Idea of VRVQA
PROPOSED METHOD
Preprocessing
Spherical CNN
Non-local Neural Networks
Network Design and Training
Datasets
Experimental Setups
Performance Evaluation
Module Comparison Evaluation
Distortion Type Evaluation
Objective score
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.