Abstract

The reported financial losses from railroad accidents since 2009 have been more than US$4.11 billion dollars. This considerable loss is a major concern for the industry, society, and the government. Therefore, identifying and ranking the factors that contribute to financial losses from railroad accidents would inform strategies to minimize them. To achieve that goal, this paper evaluates and compares the results of applying different non-parametric statistical and regression methods to 15 years of railroad Class I freight train accident data. The models compared are random forest, k-nearest neighbors, support vector machines, stochastic gradient boosting, extreme gradient boosting, and stepwise linear regression. The results indicate that these methods are all suitable for analyzing non-linear and heterogeneous railroad incident data. However, the extreme gradient boosting method provided the best performance. Therefore, the analysis used that model to identify and rank factors that contribute to financial losses, based on the gain percentage of the prediction accuracy. The number of derailed freight cars and the absence of territory signalization dominated as contributing factors in more than 57% and 20% of the accidents, respectively. Partial-dependence plots further explore the complex non-linear dependencies of each factor to better visualize and interpret the results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call