Travel time forecasting on a freeway corridor: a dynamic information fusion model based on the random forests approach

Bo Qiu,Wei Fan

doi:10.1108/srt-11-2020-0027

Abstract

Purpose Metropolitan areas suffer from frequent road traffic congestion not only during peak hours but also during off-peak periods. Different machine learning methods have been used in travel time prediction, however, such machine learning methods practically face the problem of overfitting. Tree-based ensembles have been applied in various prediction fields, and such approaches usually produce high prediction accuracy by aggregating and averaging individual decision trees. The inherent advantages of these approaches not only get better prediction results but also have a good bias-variance trade-off which can help to avoid overfitting. However, the reality is that the application of tree-based integration algorithms in traffic prediction is still limited. This study aims to improve the accuracy and interpretability of the models by using random forest (RF) to analyze and model the travel time on freeways. Design/methodology/approach As the traffic conditions often greatly change, the prediction results are often unsatisfactory. To improve the accuracy of short-term travel time prediction in the freeway network, a practically feasible and computationally efficient RF prediction method for real-world freeways by using probe traffic data was generated. In addition, the variables’ relative importance was ranked, which provides an investigation platform to gain a better understanding of how different contributing factors might affect travel time on freeways. Findings The parameters of the RF model were estimated by using the training sample set. After the parameter tuning process was completed, the proposed RF model was developed. The features’ relative importance showed that the variables (travel time 15 min before) and time of day (TOD) contribute the most to the predicted travel time result. The model performance was also evaluated and compared against the extreme gradient boosting method and the results indicated that the RF always produces more accurate travel time predictions. Originality/value This research developed an RF method to predict the freeway travel time by using the probe vehicle-based traffic data and weather data. Detailed information about the input variables and data pre-processing were presented. To measure the effectiveness of proposed travel time prediction algorithms, the mean absolute percentage errors were computed for different observation segments combined with different prediction horizons ranging from 15 to 60 min.

Highlights

Nowadays, travel time prediction plays a significant role as it can greatly help route planning and the development of countermeasures to reduce traffic congestion
The model performance was evaluated and compared against the extreme gradient boosting method and the results indicated that the random forest (RF) always produces more accurate travel time predictions
Accurate travel time prediction can enhance the performance of the traffic management systems, in which travelers are given the opportunities to react to the traffic proactively (Oh et al, 2015)

Summary

Introduction

Travel time prediction plays a significant role as it can greatly help route planning and the development of countermeasures to reduce traffic congestion. Travel time has been widely used to measure the effectiveness of transportation systems and increasingly becomes one of the most popular traffic information that travelers are interested in gathering. The ability to accurately predict travel time in transportation networks is a critical component of the traveler information system. Accurate travel time prediction can enhance the performance of the traffic management systems, in which travelers are given the opportunities to react to the traffic proactively (Oh et al, 2015). As an important performance indicator, accurate predicted travel times can be used for quantitatively comparing different traffic management systems. With the explosive availability of abundant data collected by sensors and monitors, the big data storage and processing issues have become more and more relevant (Šemanjski, 2015)

Objectives

Methods

Findings

Conclusion