Abstract
Missing data is inevitable and ubiquitous in intelligent transportation systems (ITSs). A handful of completion methods have been proposed, among which the tensor-based models have been shown to be the most advantageous for missing traffic data imputation. Despite their superior imputation accuracies, the adoption of these imputed data is not uniform in modern ITSs applications. The primary goal of this paper is to explore how to use tensor completion methods to support ITSs. In particular, we study how to improve traffic flow prediction accuracy under different missing scenarios. Specifically, three common missing scenarios including element-wise random missing, time-structured missing, and space-structured missing are considered. Four classical tensor completion models including Smooth PARAFAC Decomposition based Completion (SPC), CP Decomposition-based (CP-WOPT) Completion, Tucker Decomposition-based Completion (TDI), and High-accuracy Low-rank Tensor Completion (HaLRTC) are used to impute the missing data. Four well-known prediction methods including Support Vector Regression (SVR), K-nearest Neighbor (KNN), Gradient Boost Regression Tree (GBRT), and Long Short-term Memory (LSTM) are tested. The simple mean value interpolation completed traffic data is regarded as the baseline data. The extensive experiments show that improvements of traffic flow prediction can be achieved by increasing missing traffic data imputation accuracy at most cases. Interestingly we find that prediction accuracy cannot be improved by an imputation model when the sparsely observed training datasets already provide sufficient training samples.
Highlights
The Smooth PARAFAC decomposition based tensor completion (SPC), which is combined with the total variation (TV norm) or quadratic variation (QV norm) proposed by Yokota et al [28] has been proved to perform the best especially when the missing ratio is over 95%
Support Vector Regression (SVR) is more appropriate to small training datasets [51], [52], it provides the smallest mean absolute error (MAE) and root mean squared error (RMSE) using the data completed by the simple mean value interpolation
WORK To the best of our knowledge, this is the first paper to analyze the detailed effects of missing data and its completion to traffic flow prediction which is a basic technology of the intelligent transportation systems (ITSs)
Summary
Due to the fixation on the number of parameters, parametric methods [38], such as Autoregressive Integrated Moving Average (ARIMA) [39], failed to fit complex functions, are unable to remain robust prediction for heterogeneous traffic data To avoid this problem, various nonparametric methods have been proposed, including Artificial Neural Networks (ANNS) [40], K-nearest Neighbor (KNN) [41], Support Vector Regression (SVR) [42], ensemble methods like the Gradient Boosting Regression Tree (GBRT) [43] and so on. They demonstrated completing missing data can improve the accuracy of traffic flow prediction Their analysis is not comprehensive, which only considered two kinds of mixed random missing scenarios, one kind of matrix completion based imputation method (PPCA), and just under low missing rates (up to 50% missing). All the slices of ω along the location mode, i.e. ω(:, :, i3) for all i3 ∈ {1, 2, · · · I3}, are set to be a randomly missing matrix M2 RI1×I2 with random entries 0 and 1
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have