Abstract

360-degree video has shown great potential to the mainstream since its immersive experience. However, 360-degree video streaming requires ultrahigh bandwidth and low latency, which limit the improvement of user quality of experience (QoE). Currently, methods combining field of view (FoV) prediction and adaptive video streaming provide an effective method for addressing the above issues. However, existing FoV prediction methods based on recurrent neural networks (RNN) cannot capture long-range dependency from input to output. Current deep reinforcement learning (DRL)-based adaptive strategies fail to estimate the future bandwidth with high accuracy and fully explore the capability of VR devices. To ameliorate these limitations, we design a DRL-based 360-degree video streaming method named VRFormer with FoV combined prediction and super resolution (SR). First, we adopt a content-aware transformer-based encoder-decoder network to make the long-term FoV prediction. It combines the user's head movement history, eye-tracking history, and user attention extracted from a convolutional neural network (CNN)-based network. Second, we introduce a DNN-based SR network running on VR devices to reconstruct high-definition video content. Finally, we apply a DRL-based network to adaptively allocate rates for future tiles and dynamically control video content reconstruction. Experiments have verified that the proposed method can effectively improve the quality of experience (QoE) of the user's viewing experience compared to the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call