Abstract

Accurate train arrival delay prediction is critical for real-time train dispatching and for the improvement of the transportation service. This study proposes a data-driven method that combines eXtreme Gradient Boosting (XGBoost) and a Bayesian optimization (BO) algorithm to predict train arrival delays. First, eleven characteristics that may affect the train arrival time at the next scheduled station are identified as independent variables. Second, an XGBoost prediction model that captures the relation between the train arrival delays and various railway system characteristics is established. Third, the BO algorithm is applied to the hyperparameter optimization of the XGBoost model to improve the prediction accuracy. Subsequently, case studies using data from two high-speed railway (HSR) lines in China are performed to analyze the prediction efficiency and accuracy of the proposed model for different delay bins and at different stations. The results on two HSR lines demonstrate that the proposed method outperforms other benchmark models regarding the performance metrics of the determination coefficient (0.9889/0.9905), root-mean-squared error (2.686/1.887), and mean absolute error (0.896/ 0.802). In addition, the statistical test is carried out using Friedman Test (FT) and Wilcoxon Signed Rank Test (WSRT) to validate the efficacy of the proposed method. Furthermore, the train arrival delays at different abnormal events can also be accurately forecasted using the proposed method; the results indicate that the proposed method outperforms other benchmark methods, especially in the prediction of long delays caused by specific abnormal events.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call