Traffic crash prediction (TCP) is a fundamental problem for intelligent transportation systems in smart cities. Improving the accuracy of traffic crash prediction is important for road safety and effective traffic management. Owing to recent advances in artificial neural networks, several new deep-learning models have been proposed for TCP. However, these works mainly focus on accidents in regions, which are typically pre-determined using a grid map. We argue that TCP for roads, especially for crashes at or near road intersections which account for more than 50% of the fatal or injury crashes based on the Federal Highway Administration, has a significant practical and research value and thus deserves more research. In this paper, we formulate TCP at Road Intersections as a classification problem and propose a three-phase data-driven deep learning model, called Road Intersection Traffic Crash Prediction (RoadInTCP), to predict traffic crashes at intersections by exploiting publicly available heterogeneous big data. In Phase I we extract discriminative latent features called topological-relational features (tr-features), of intersections using a neural network model by exploiting topological information of the road network and various relationships amongst nearby intersections. In Phase II, in addition to tr-features which capture some inherent properties of the road network, we also explore additional thematic information in terms of environmental, traffic, weather, risk, and calendar features associated with intersections. In order to incorporate the potential correlation in nearby intersections, we utilize a Graph Convolution Network (GCN) to aggregate features from neighboring intersections based on a message-passing paradigm for TCP. While Phase II serves well as a TCP model, we further explore the signals embedded in the sequential feature changes over time for TCP in Phase III, by exploring RNN or 1DCNN which have known success on sequential data. Additionally, to address the serious issues of imbalanced classes in TCP and large-scale heterogeneous big data, we propose an effective data sampling approach in data preparation to facilitate model training. We evaluate the proposed RoadInTCP model via extensive experiments on a real-world New York City traffic dataset. The experimental results show that the proposed RoadInTCP robustly outperforms existing methods.
Read full abstract