Weighted Ensemble Methods for Predicting Train Delays

Mostafa Al Ghamdi,Gerard Parr,Wenjia Wang

doi:10.1007/978-3-030-58799-4_43

Abstract

Train delays have become a serious and common problem in the rail services due to the increasing number of passengers and limited rail network capacity, so being able to predict train delays accurately is essential for train controllers to devise appropriate plans to prevent or reduce some delays. This paper presents a machine learning ensemble framework to improve the accuracy and consistency of train delay prediction. The basic idea is to train many different types of machine learning models for each station along a chosen journey of train service using historical data and relevant weather data, and then with certain criteria to choose some models to build an ensemble. It then combines the outputs from its member models with an aggregation function to produce the final prediction. Two aggregation functions were devised to combine the outputs of individual models: averaging and weighted averaging. These ensembles were implemented with a framework and their performance was tested with the data from an intercity train service as a case study. The accuracy was measured by the percentages of correct prediction of the arrival time for a train and correct prediction within one minute to the actual arrival time. The mean accuracies and standard deviations are 42.3%(\({\pm }11.24\)) from the individual models, 57.8%(\({\pm }3.56\)) from the averaging ensembles, and 72.8%(\({\pm }0.99\)) from the weighted ensembles. For the predictions within one minute of the actual times, they are 86.4%(\({\pm }14.05\)), 94.6%(\({\pm }1.34\)) and 96.0%(\({\pm }0.47\)) respectively. So overall, the ensembles significantly improved not only the prediction accuracies but also the consistency and the weighted ensembles are clearly the best.

Full Text