Abstract

Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.

Highlights

  • Real-time information mining for regression problems involving a huge time series dataset is becoming an increasingly challenging task in the data mining community

  • We have improved the accuracy of the fast-incremental model tree with drift detection (FIMT-DD) algorithm [5], which was developed by Ikonomovska et al [3]

  • According to the evaluation metrics (MAE, Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE)), our approach gives consistently lower errors compared to previous methods

Read more

Summary

Introduction

Real-time information mining for regression problems involving a huge time series dataset is becoming an increasingly challenging task in the data mining community. The authors applied deep belief networks to predict the traffic flow and Tennet’s wind power plant dataset using the data stream and event-based data, and Dempster–Shafer theory was used to renew the belief and integrate the results. The results of the research showed that the accuracy (measurement error) of the FIMT-DD algorithm follows a decreasing trend in the stream evaluation process [11].

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call