Machine Learning Based Content-Agnostic Viewport Prediction for 360-Degree Video

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Accurate and fast estimations or predictions of the (near) future location of the users of head-mounted devices within the virtual omnidirectional environment open a plethora of opportunities in application domains such as interactive immersive gaming and tele-surgery. Therefore, the past years have seen growing attention to models for viewport prediction in 360֯ environments. Among the approaches, content-agnostic, trajectory-based methods have the potential to provide very fast solutions, as they do not require complex analysis of the videos to provide a prediction. However, accurate trajectory-based viewport prediction is rather difficult due to the intrinsic variability in user behaviour. Furthermore, even when making use of machine learning, current approaches tend to be brute-force and heavily tailored to specific datasets with little comparison to existing benchmarks or publicly available studies. This article presents a generic, content-agnostic viewport prediction method consisting of a window-based approach combined with a preprocessing system to classify behavioural patterns in terms of user clustering and trajectory correlation. Moreover, as the state of the art does not provide a comparative analysis of different approaches, this work contributes to this. Based on the obtained results, a combined prediction model is proposed and evaluated. Our method shows a 36.8% to 53.9% improvement when compared to the static prediction baseline for a prediction horizon of 8 seconds. In addition, a 11.5% to 24.0% improvement to a brute-force machine learning prediction approach is obtained. As such, this work contributes towards the creation of more generic and structured solutions for content-agnostic viewport prediction in terms of data representation, preprocessing and modelling.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.54254/2977-3903/2024.19435
A MDA-based multi-modal fusion model for panoramic viewport prediction
  • Dec 26, 2024
  • Advances in Engineering Innovation
  • Jinghao Lyu

The application of fusion technology is of considerable importance in the field of multi-modal viewport prediction. The latest attention-based fusion methods have been shown to perform well in prediction accuracy. However, these methods fail to account for the differential density of information among the three modalities involved in viewport prediction - trajectory, visual, and audio. Visual and audio modalities present primitive signal information, while trajectory modality shows advanced time-series information. In this paper, a viewport prediction framework based on a Modality Diversity-Aware (MDA) fusion network is proposed to achieve multi-modal feature interaction. Firstly, we designed a fusion module to promote the combination of visual and auditory modalities, augmenting their efficacy as advanced complementary features. Subsequently, we utilize cross-modal attention to enable reinforced integration of visual-audio fused information and trajectory features. Our method addresses the issue of differing information densities among the three modalities, ensuring a fair and effective interaction between them. To evaluate the efficacy of the proposed approach, we conducted experiments on a widely-used public dataset. Experiments demonstrate that our approach predicts accurate viewport areas with a significant decrease in model parameters.

  • Conference Article
  • 10.1109/ism.2020.00020
Redefine the A in ABR for 360-degree Videos: A Flexible ABR Framework
  • Dec 1, 2020
  • Kuan-Ying Lee + 3 more

360-degree video has been popular due to the immersive experience it provides to the viewer. While watching, viewer can control the field of view (FoV)11In this paper, we use viewport interchangeably with FoV in the range of 360° by 180°. As this trend continues, adaptive bitrate (ABR) streaming is becoming a prevalent issue. Most existing ABR algorithms for 360 videos (360 ABR algorithms) require real-time head traces and certain computation resource from the client for streaming, which largely constrains the range of audience. Also, while more 360 ABR algorithms rely upon machine learning (ML) for viewport prediction, ML and ABR are research topics that grow mostly independently. In this paper, we propose a two-fold ABR algorithm for 360 video streaming that utilizes 1) an off-the-shelf ABR algorithm for ordinary videos, and 2) an off-the-shelf viewport prediction model. Our algorithm requires neither real-time head traces nor additional computation from the viewing device. In addition, it adapts easily to the newest developments in viewport prediction and ABR. As a consequence, the proposed method fits nicely to the existing streaming framework and any advancement in viewport prediction and ABR could enhance its performance. With the quantitative experiments, we demonstrate that the proposed method achieves twice the quality of experience (QoE) compared to the baseline.

  • Research Article
  • 10.1145/3730402
VPFormer: Leveraging Transformer with Voxel Integration for Viewport Prediction in Volumetric Video
  • Jun 30, 2025
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Jie Li + 7 more

With the continuous advancement of computer vision, image processing technologies, volumetric video, represented by point cloud videos, holds the potential for extensive applications in areas such as Virtual Reality (VR) and Augmented Reality (AR). Viewport prediction, also referred to as Field of View (FoV) prediction, is a crucial component in emerging VR and AR applications, playing a vital role in the transmission of point cloud videos. Currently, models for viewpoint prediction that integrate feature extraction and FoV information heavily rely on the spatial-temporal features extracted by convolutional neural networks. However, the drawback of 3D convolution lies in its inability to effectively capture long-term spatial-temporal dependencies within videos. Moreover, the temporal contrast layer used for time feature extraction only compares features within each block, leading to matching errors and inaccurate temporal feature extraction, consequently diminishing predictive performance. To address these limitations, we propose a Transformer-based Volumetric Point Cloud Video Viewport Prediction Network (VPFormer) that can efficiently extract spatial-temporal features from point cloud videos. VPFormer constitutes a viewport prediction framework that combines the spatial-temporal features of point cloud videos with user trajectory information. Specifically, we introduce a novel sampling method that effectively preserves spatial-temporal information while reducing computational complexity. Additionally, we incorporate context-aware dynamic positional encoding to capture inter-frame spatial-temporal context information. Subsequently, we introduce a voxel-based temporal contrast layer and partition the point cloud into smaller voxel blocks during feature matching, significantly reducing matching errors and enhancing the analysis and extraction of temporal features. Finally, by combining the spatial-temporal features of point cloud videos with user head trajectory information, we successfully predict future user viewpoints. Experimental results demonstrate that this approach outperforms other solutions in terms of performance.

  • Research Article
  • 10.3390/electronics14183743
Learning-Based Viewport Prediction for 360-Degree Videos: A Review
  • Sep 22, 2025
  • Electronics
  • Mahmoud Z A Wahba + 2 more

Nowadays, virtual reality is experiencing widespread adoption, and its popularity is expected to grow in the next few decades. A relevant portion of virtual reality content is represented by 360-degree videos, which allow users to be surrounded by the video content and to explore it without limitations. However, 360-degree videos are extremely demanding in terms of storage and streaming requirements. At the same time, users are not able to enjoy the 360-degree content all at once due to the inherent limitations of the human visual system. For this reason, viewport prediction techniques have been proposed: they aim at forecasting where the user will look, thus allowing the transmission of the sole viewport content or the assignment of a different quality level for viewport and non-viewport regions. In this context, artificial intelligence plays a pivotal role in the development of high-performance viewport prediction solutions. In this work, we analyze the evolution of viewport prediction based on machine and deep learning techniques in the last decade, focusing on their classification based on the employed processing technique, as well as the input and output formats. Our review shows common gaps in the existing approaches, thus paving the way for future research. An increase in viewport prediction accuracy and reliability will foster the diffusion of virtual reality content in real-life scenarios.

  • Conference Article
  • 10.1109/cyberc.2010.102
Residue Amending Combined BP Prediction Model
  • Oct 1, 2010
  • Wang Zhe + 2 more

The thesis introduces grey system model and BP neural network. Through making full use of the merits of GM(1.1) and neural network model and overcoming their drawbacks, we construct the grey residue amending combined and prediction model based on BP Neural network, and such combined model as "combined prediction model= tendency prediction model/GM(1.1)+neural network model", and makes a contrast between the three models in prediction and precision. The result indicates that, the combined model is better than that of the single models for higher precision and smaller error.

  • Conference Article
  • 10.1109/cise.2010.5677007
Residue Amending Combined BP Prediction Model
  • Dec 1, 2010
  • Zhe Wang + 2 more

the thesis introduces grey system model and BP neural network. Through making full use of the merits of GM(1.1) and neural network model and overcoming their drawbacks, we construct the grey residue amending and prediction model based on BP Neural network, and such model as combined prediction model= tendency prediction model/GM(1.1)+neural network model, and makes a contrast between the three models in prediction and precision. The result indicates that, the model is better than that of the single models for higher precision and smaller error.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.egyr.2023.04.326
A combined wind speed prediction model based on data processing, multi-objective optimization and machine learning
  • Apr 27, 2023
  • Energy Reports
  • He Wang + 3 more

A combined wind speed prediction model based on data processing, multi-objective optimization and machine learning

  • Research Article
  • Cite Count Icon 3
  • 10.19725/j.cnki.1007-2322.2021.0073
Wind Power Combination Prediction Model Based on Time Series Decomposition and Machine Learning
  • Dec 10, 2021
  • Dongmei Zhao + 4 more

Accurate wind power prediction results can improve the grid-connected capacity of wind power under the stable and secure operation of power grid. To improve the prediction accuracy of wind power, by means of integrating time series decomposition technology, machine learning and heuristic algorithm a dual-level combined prediction model for wind power was proposed. Firstly, a prediction model combining empirical mode decomposition technology with long- and short- term memory network (abbr. EMD-LSTM) was constructed. Meanwhile, a prediction model, in which the variational mode decomposition and simulated annealing algorithm (abbr. VMD-SA) were combined with deep belief network (abbr. DBN), was proposed. The constructed EMD-LSTM model and VMD-SA-DBN model were taken as the basic prediction models of the upper layer of the combined prediction model. Secondly, the extreme gradient boosting algorithm was used to construct the lower layer of the combined prediction model, and the prediction result from the two basic prediction model in the upper layer was input into the lower prediction model, to obtain the final prediction result. Finally, the effectiveness of the proposed algorithm was verified by measured data. Verification result shows that the prediction accuracy by the proposed two layer combined prediction model is higher than that from the single prediction model.

  • Research Article
  • Cite Count Icon 2
  • 10.1145/3701734
A Clustering Approach to Unveil User Similarities in 6 df Extended Reality Applications
  • Sep 11, 2025
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Silvia Rossi + 3 more

The advent in our daily life of Extended Reality (XR) technologies, such as Virtual and Augmented Reality, has led to the rise of user-centric systems, offering higher level of interaction and presence in virtual environments. In this context, understanding the actual interactivity of users is still an open challenge and a key step to enabling user-centric system. In this work, our goal is to construct an efficient clustering tool for 6 df navigation trajectories by extending the applicability of existing behavioural tool. Specifically, we first compare the navigation in 6 df with its 3 df counterpart, highlighting the main differences and novelties. Then, we investigate new metrics aimed at better modelling behavioural similarities between users in a 6 df system. More concretely, we define and compare 11 similarity metrics which are based on different distance features (i.e., user positions in the 3D space, user viewing directions) and distance measurements (i.e., Euclidean, Geodesic, angular distance). Our solutions are validated and tested on real navigation paths of users interacting with dynamic volumetric media in both 6 df Virtual Reality and Augmented Reality conditions. Results show that metrics based on both user position and viewing direction better perform in detecting user similarity while navigating in a 6 df system. Such easy-to-use but robust metrics allow us to answer a fundamental question for user-centric systems: ‘How do we detect if users look at the same content in 6 df?’, opening the gate to new solutions based on users interactivity, such as viewport prediction, live streaming services optimised based on users behaviour but also for user-based quality assessment methods.

  • Research Article
  • Cite Count Icon 35
  • 10.1021/acs.jcim.2c01134
Computational Predictions of Nonclinical Pharmacokinetics at the Drug Design Stage.
  • Jan 3, 2023
  • Journal of Chemical Information and Modeling
  • Raya Stoyanova + 7 more

Although computational predictions of pharmacokinetics (PK) are desirable at the drug design stage, existing approaches are often limited by prediction accuracy and human interpretability. Using a discovery data set of mouse and rat PK studies at Roche (9,685 unique compounds), we performed a proof-of-concept study to predict key PK properties from chemical structure alone, including plasma clearance (CLp), volume of distribution at steady-state (Vss), and oral bioavailability (F). Ten machine learning (ML) models were evaluated, including Single-Task, Multitask, and transfer learning approaches (i.e., pretraining with in vitro data). In addition to prediction accuracy, we emphasized human interpretability of outcomes, especially the quantification of uncertainty, applicability domains, and explanations of predictions in terms of molecular features. Results show that intravenous (IV) PK properties (CLp and Vss) can be predicted with good precision (average absolute fold error, AAFE of 1.96-2.84 depending on data split) and low bias (average fold error, AFE of 0.98-1.36), with AutoGluon, Gaussian Process Regressor (GP), and ChemProp displaying the best performance. Driven by higher complexity of oral PK studies, predictions of F were more challenging, with the best AAFE values of 2.35-2.60 and higher overprediction bias (AFE of 1.45-1.62). Multi-Task approaches and pretraining of ChemProp neural networks with in vitro data showed similar precision to Single-Task models but helped reduce the bias and increase correlations between observations and predictions. A combination of GP-computed prediction variance, molecular clustering, and dimensionality-reduction provided valuable quantitative insights into prediction uncertainty and applicability domains. SHAPley Additive exPlanations (SHAPs) highlighted molecular features contributing to prediction outcomes of Vss, providing explanations that could aid drug design. Combined results show that computational predictions of PK are feasible at the drug design stage, with several ML technologies converging to successfully leverage historical PK data sets. Further studies are needed to unlock the full potential of this approach, especially with respect to data set sizes and quality, transfer learning between in vitro and in vivo data sets, model-independent quantification of uncertainty, and explainability of predictions.

  • Research Article
  • Cite Count Icon 12
  • 10.3390/app12167978
A Two-Stage Decomposition-Reinforcement Learning Optimal Combined Short-Time Traffic Flow Prediction Model Considering Multiple Factors
  • Aug 9, 2022
  • Applied Sciences
  • Dayi Qu + 3 more

Accurate short-term traffic flow prediction is a prerequisite for achieving an intelligent transportation system to proactively alleviate traffic congestion. Considering the complex and variable traffic environment, so that the traffic flow contains a large number of non-linear characteristics, which makes it difficult to improve the prediction accuracy, a combined prediction model that reduces the unsteadiness of traffic flow and fully extracts the traffic flow features is proposed. Firstly, decompose the traffic flow data into multiple components by the seasonal and trend decomposition using loess (STL); these components contain different features, and the optimized variational modal decomposition (VMD) is used for the second decomposition of the component with large fluctuation frequencies, and then the components are reconstructed according to the fuzzy entropy and Lempel-Ziv complexity index and the Pearson correlation coefficient is used to filter the traffic flow features. Then light gradient boosting machine (LightGBM), long short-term memory with attention mechanism (LA), and kernel extreme learning machine with genetic algorithm optimization (GA-KELM) are built for prediction. Finally, we use reinforcement learning to integrate the advantages of each model, and the weights of each model are determined to obtain the best prediction results. The case study shows that the model established in this paper is better than other models in predicting urban road traffic flow, with an average absolute error of 2.622 and a root mean square error of 3.479, both of which are lower than the prediction errors of other models, indicating that the model can fully extract the features in complex traffic flow.

  • Conference Article
  • Cite Count Icon 26
  • 10.1145/3359989.3365413
Analyzing viewport prediction under different VR interactions
  • Dec 3, 2019
  • Tan Xu + 2 more

In this paper, we study the problem of predicting a user's viewport movement in a networked VR system (i.e., predicting which direction the viewer will look at shortly). This critical knowledge will guide the VR system through making judicious content fetching decisions, leading to efficient network bandwidth utilization (e.g., up to 35% on LTE networks as demonstrated by our previous work) and improved Quality of Experience (QoE). For this study, we collect viewport trajectory traces from 275 users who have watched popular 360° panoramic videos for a total duration of 156 hours. Leveraging our unique datasets, we compare viewport movement patterns of different interaction modes: wearing a head-mounted device, tilting a smartphone, and dragging the mouse on a PC. We then apply diverse machine learning algorithms - from simple regression to sophisticated deep learning that leverages crowd-sourced data - to analyze the performance of viewport prediction. We find that the deep learning approach is robust for all interaction modes and yields supreme performance, especially when the viewport is more challenging to predict, e.g., for a longer prediction window, or with a more dynamic movement. Overall, our analysis provides key insights on how to intelligently perform viewport prediction in networked VR systems.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.dajour.2024.100515
A performance and interpretability assessment of machine learning models for rainfall prediction in the Republic of Ireland
  • Aug 24, 2024
  • Decision Analytics Journal
  • Menatallah Abdel Azeem + 1 more

A performance and interpretability assessment of machine learning models for rainfall prediction in the Republic of Ireland

  • Research Article
  • Cite Count Icon 47
  • 10.1109/access.2022.3176619
Mobile Network Coverage Prediction Based on Supervised Machine Learning Algorithms
  • Jan 1, 2022
  • IEEE Access
  • Mohd Fazuwan Ahmad Fauzi + 3 more

The need for wider coverage and high-performance quality of mobile networks is critical due to the maturity of Internet penetration in today&#x2019;s society. One of the primary drivers of this demand is the dramatic shift toward digitalization due to the Covid-19 pandemic impact. Meanwhile, the emergence of the 5G wireless standard and the increasingly complex actual operating environment of mobile networks make the traditional prediction model less reliable. With the recent advancements and promising capabilities of machine learning (ML), it is seen as an alternative to the traditional approaches for ground to ground (G2G) mobile communication coverage prediction. In this study, various ML models have been tested and evaluated to develop an ML-based received signal strength prediction model for mobile networks. However, the challenge is to identify a practical ML model that can fulfill the computing speed criteria while still meeting the prediction accuracy. A total of six categories of ML models, namely Linear Regression (LR), Artificial Neural Network (ANN), Support Vector Machine (SVM), Regression Trees (RT), Ensembles of Trees (ET), and Gaussian Process Regression (GPR) that consists of more than 20 types of established algorithms/kernels have been tested and evaluated in this paper to identify the best contender among them, in terms of speed and accuracy. Findings from the evaluation showed that the GPR model is the most accurate model for Reference Signal Received Power (RSRP) prediction in terms of <inline-formula> <tex-math notation="LaTeX">$RMSE$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula>, followed by ET, RT, SVM, ANN and LR. Nevertheless, prediction speed and model training times are also important factors in determining the most practical model for RSRP prediction for several real-world mobile network planning applications. Finally, the ET model with Random Forest (RF) algorithm has been selected and highly recommended as the most practically employed ML model for developing rigorous RSRP predictions model in multi-frequency bands and multi-environment. The developed prediction model is capable of being utilized for the network analysis and optimization.

  • Research Article
  • 10.1158/1538-7445.sabcs19-p1-10-29
Abstract P1-10-29: Radiomics improved pre-therapeutic prediction of breast cancers insensitive to neoadjuvant chemotherapy
  • Feb 14, 2020
  • Cancer Research
  • Xuezhi Zhou + 5 more

Background: Approximately 10–35% of breast cancers were found to be insensitive to neoadjuvant chemotherapy (NAC), and approximately 5% of patients had larger tumors after NAC; in these patients, NAC failed to exhibit a therapeutic effect and instead delayed surgical treatment. Thus, it is critical to identify predictive biomarkers to enhance patient selection for NAC. Here we report a radiomic model for pre-therapeutic prediction of breast cancers insensitive to NAC. Materials and Methods: We retrospectively enrolled 125 breast cancer patients (63 in the primary cohort and 62 in the validation cohort) who underwent magnetic resonance imaging (imaging sequence: diffusion-weighted, T2-weighted and contrast-enhanced T1-weighted) before receiving NAC. All patients received surgical resection, and Miller-Payne grading system were applied to assess the response to NAC. Grade 1-2 cases were classified as insensitive to NAC. We extracted 1941 radiomic features in the primary cohort. After feature selection, the optimal feature set was used to construct a radiomic signature using machine learning. We built a combined prediction model incorporating the radiomic signature and independent clinical risk factors using multivariable logistic regression. The performance of the combined model was assessed in the validation cohort. Results: The radiomic signature consisting 4 features showed good performance for identifying Grade 1-2 group, yielding an area under the curve (AUC) of 0.83 (95% confidence interval: 0.647–1) in the validation cohort. A clinical model based on human epidermal growth factor receptor-2 (HER2) status and Ki67 index yielded an AUC of 0.792 (95% confidence interval: 0.668–0.916) in the validation cohort. Incorporating radiomic signature, HER2 status and Ki67 index, the combined prediction model reached a better discrimination power than the radiomic signature and clinical model, with an AUC of 0.935 (95% confidence interval, 0.848–1) in the validation cohort. Conclusion: The combined model based on radiomics and clinical variables has a great potential in predicting drug insensitive breast cancers. Citation Format: Xuezhi Zhou, Zhenyu Liu, Yang Du, Qianqian Xiong, Kun Wang, Jie Tian. Radiomics improved pre-therapeutic prediction of breast cancers insensitive to neoadjuvant chemotherapy [abstract]. In: Proceedings of the 2019 San Antonio Breast Cancer Symposium; 2019 Dec 10-14; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2020;80(4 Suppl):Abstract nr P1-10-29.

Save Icon
Up Arrow
Open/Close