Abstract

Data streams have arisen as a relevant topic during the last few years as an efficient method for extracting knowledge from big data. In the robust layered ensemble model (RLEM) proposed in this paper for short-term traffic flow forecasting, incoming traffic flow data of all connected road links are organized in chunks corresponding to an optimal time lag. The RLEM model is composed of two layers. In the first layer, we cluster the chunks by using the Graded Possibilistic c-Means method. The second layer is made up by an ensemble of forecasters, each of them trained for short-term traffic flow forecasting on the chunks belonging to a specific cluster. In the operational phase, as a new chunk of traffic flow data presented as input to the RLEM, its memberships to all clusters are evaluated, and if it is not recognized as an outlier, the outputs of all forecasters are combined in an ensemble, obtaining in this a way a forecasting of traffic flow for a short-term time horizon. The proposed RLEM model is evaluated on a synthetic data set, on a traffic flow data simulator and on two real-world traffic flow data sets. The model gives an accurate forecasting of the traffic flow rates with outlier detection and shows a good adaptation to non-stationary traffic regimes. Given its characteristics of outlier detection, accuracy, and robustness, RLEM can be fruitfully integrated in traffic flow management systems.

Highlights

  • Data streams are ordered, potentially unbounded sequences of observations made available over time [24, 43, 57, 58]

  • The experimental validation of proposed robust layered ensemble model included the test of the clustering procedure based on the Graded Possibilistic c-Means on an artificial data set with built-in concept drift and shift

  • We have proposed the robust layered ensemble model (RLEM) model for shortterm traffic flow forecasting

Read more

Summary

Introduction

Potentially unbounded sequences of observations (data elements) made available over time [24, 43, 57, 58]. In many data stream mining applications where data exhibit a time series nature, the goal is to predict information about future instances in the data stream given some knowledge about previous ones. This can be approached either by modelling of the dynamics of the system, or by autoregressive models. As an example of this latter case, a sensor network may provide just the information that requires attention by the human supervisor, rather than transmitting all records This task goes by the name of anomaly or outlier detection [7, 11]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call