Abstract

Entering a new era of big data, analysis of large amounts of real-time data is important, and air quality data as streaming time series are measured by several different sensors. To this end, numerous methods for time-series forecasting and deep-learning approaches based on neural networks have been used. However, they usually rely on a certain model with a stationary condition, and there are few studies of real-time prediction of dynamic massive multivariate data. Use of a variety of independent variables included in the data is important to improve forecasting performance. In this paper, we proposed a real-time prediction approach based on an ensemble method for multivariate time-series data. The suggested method can select multivariate time-series variables and incorporate real-time updatable autoregressive models in terms of performance. We verified the proposed model using simulated data and applied it to predict air quality measured by five sensors and failures based on real-time performance log data in server systems. We found that the proposed method for air pollution prediction showed effective and stable performance for both short- and long-term prediction tasks. In addition, traditional methods for abnormality detection have focused on present status of objects as either normal or abnormal based on provided data, we protectively predict expected statuses of objects with provided real-time data and implement effective system management in cloud environments through the proposed method.

Highlights

  • Massive real-time data storage and real-time data visualization are available in many industries, and have improved data analysis techniques for real-time data

  • We propose multivariate ensemble method based on dynamic transfer model for stable real-time prediction and verify its performance by applying it to predict failures with performance log data generated in a server system

  • When one attempts to find lags from cross correlation between input and output variables, the model can be unclear and too empirical [16]. Some solutions such as using Monte-Carlo-based analysis [17] have been proposed to solve this problem, we suggest a novel ensemble-based method

Read more

Summary

Introduction

Massive real-time data storage and real-time data visualization are available in many industries, and have improved data analysis techniques for real-time data. Much log big data has been produced between web or mobile applications and server systems because of developments of web and IoT systems, etc. APM (Application Performance Management) is a real-time log big data analysis system that collects and manages performance information of a server system between usages of user applications and services of a server system such as web application server or data base server. Various extended models such as VAR (Vector Autoregressive) and VARMA

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call