Lightweight Neural Network-Based Viewport Prediction for Live VR Streaming in Wireless Video Sensor Network

Ishfaq Ahmad,Baoning Cao,Xiaolei Chen

doi:10.1155/2021/8501990

Abstract

Live virtual reality (VR) streaming (a.k.a., 360-degree video streaming) has become increasingly popular because of the rapid growth of head‐mounted displays and 5G networking deployment. However, the huge bandwidth and the energy required to deliver live VR frames in the wireless video sensor network (WVSN) become bottlenecks, making it impossible for the application to be deployed more widely. To solve the bandwidth and energy challenges, VR video viewport prediction has been proposed as a feasible solution. However, the existing works mainly focuses on the bandwidth usage and prediction accuracy and ignores the resource consumption of the server. In this study, we propose a lightweight neural network-based viewport prediction method for live VR streaming in WVSN to overcome these problems. In particular, we (1) use a compressed channel lightweight network (C-GhostNet) to reduce the parameters of the whole model and (2) use an improved gate recurrent unit module (GRU-ECA) and C-GhostNet to process the video data and head movement data separately to improve the prediction accuracy. To evaluate the performance of our method, we conducted extensive experiments using an open VR user dataset. The experiments results demonstrate that our method achieves significant server resource saving, real-time performance, and high prediction accuracy, while achieving low bandwidth usage and low energy consumption in WVSN, which meets the requirement of live VR streaming.

Highlights

In recent years, with the increasing demand for immersive multimedia experiences, virtual reality (VR) video streaming (a.k.a., 360-degree video streaming) has become increasingly popular
To save server resources and solve the real-time prediction accuracy and bandwidth challenges effectively, we propose a lightweight neural network-based viewport prediction method for live virtual reality streaming in the wireless video sensor network (WVSN). e method uses an alternate and hybrid deep learning method to achieve accurate, real-time, low server resource consumption and low bandwidth usage viewport prediction
The video is divided into segments by the packer, and they are distributed to the client through the content distribution network (CDN) using the optimized viewportadaptive 360-degree video streaming mechanism [25]

Summary

Introduction

With the increasing demand for immersive multimedia experiences, virtual reality (VR) video streaming (a.k.a., 360-degree video streaming) has become increasingly popular. Because the field of view (FoV) covered by 360-degree video is up to 360°, compared with traditional 2D video with a field of view less than 50°, the data volume of 360-degree video is more than 5 times larger than that of two-dimensional video at the same resolution and video length Faced with such a huge amount of data, under the existing wireless network bandwidth conditions, even if the latest H.265/HEVC and other standards are used for compression and encoding, it is necessary to transmit a 360-degree video with sufficient resolution and covering a complete 360-degree scene. The part inside the predicted viewport is coded with high quality, and the part outside the predicted viewport is coded with low quality [26,27,28,29] In this way, users can still watch low-quality videos to avoid interruption of the VR video experience even if the predicted viewport is not accurate

Methods

Results

Conclusion