Abstract

The missing data problem is inevitable when collecting traffic data from intelligent transportation systems. Previous studies have shown the advantages of tensor completion-based approaches in solving multi-dimensional data imputation problems. In this paper, we extend the Bayesian probabilistic matrix factorization model by Salakhutdinov and Mnih (2008) to higher-order tensors and apply it for spatiotemporal traffic data imputation tasks. In doing so, we care about not only the model configuration but also the representation of data (i.e., matrix, third-order tensor and fourth-order tensor). Using a nine-week spatiotemporal traffic speed data set (road segment × day × time of day) collected in Guangzhou, China, we evaluate the performance of this fully Bayesian model and explore how different data representations affect imputation performance through extensive experiments. The results show the proposed model can produce accurate imputations even under temporally correlated data corruption. Our experiments also show that data representation is a crucial factor for model performance, and a third-order tensor structure outperforms the matrix and fourth-order tensor representations in preserving information in our data set. We hope this work could give insights to practitioners when performing spatiotemporal data imputation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call