A Gradient Boosting Crash Prediction Approach for Highway-Rail Grade Crossing Crash Analysis

Pan Lu,Amin Keramati,Xiaoyi Zhou,Ying Huang,Zijian Zheng,Denver Tolliver,Yihao Ren

doi:10.1155/2020/6751728

Abstract

Highway-rail grade crossing (HRGC) crashes continue to be the major contributors to rail causalities in the United States and have been intensively researched in the past. Data-mining models focus on prediction while dominant general linear models focus on model and data fitness. Decision makers and traffic engineers rely on prediction models to examine at-grade crash frequency and make safety improvement. The gradient boosting (GB) model has gained popularity in many research areas. In this study, to fully understand the model performance on HRGC accident prediction performance, the GB model with functional gradient descent algorithm is selected to analyze crashes at highway-rail grade crossings (HRGCs) and to identify contributor factors. Moreover, contributors’ importance and partial-dependent relations are generated to further understand the relationship of identified contributors and HRGC crash likelihood to concur “black box” issues that most machine learning methods face. Furthermore, to fully demonstrate the model’s prediction performance, a comprehensive model prediction power assessment based on six measurements is conducted, and the prediction performance of the GB model is verified and compared with a decision tree model as a reference due to their popularity and comparable data availability. It is demonstrated that the GB model produces better prediction accuracy and reveals nonlinear relationships among contributors and crash likelihood. In general, HRGC crash likelihood is significantly impacted by several traffic exposure factors: highway traffic volume, railway traffic volume, and train travel speed and others.

Highlights

Crashes between motor vehicles and trains at highway-rail grade crossings (HRGCs) often have severe consequences [1]
Of all crashes at HRGCs in the U.S (2000 to 2014), 12% resulted in fatalities [2]
As indicated by Lu and Tolliver [5] and Oh et al [6], HRGC crash data often show underdispersion distribution where sample variance is less than the sample mean, and less common generalized linear models (GLMs) are Journal of Advanced Transportation suitable for such datasets

Summary

Introduction

Crashes between motor vehicles and trains at highway-rail grade crossings (HRGCs) often have severe consequences [1]. Numerous models have been developed to identify major contributing factors and explore relationships between crashes and explanatory variables to better understand safety performance and be able to apply effective countermeasures to reduce crash rates at HRGCSs. Since crash data have random, discrete, and nonnegative characteristics, generalized linear models (GLMs) [3] have been commonly selected to investigate the relationship between crashes and contributing factors. To fully demonstrate the model application and its capabilities to analyze safety data, a robust datamining technique, the gradient boosting (GB) model is selected to analyze crashes at HRGCs. Unlike GLMs, it requires no predefined underlying relationship between dependent and independent variables. Us, underdispersed HRGC data are not an issue. To better understand the model forecasting performance, a comprehensive model forecasting accuracy evaluation system including six measurements is proposed and evaluated

Literature Review

Methodology

Findings

Research Summary