Abstract

Quick and accurate crash detection is important for saving lives and improved traffic incident management. In this paper, a feature fusion-based deep learning framework was developed for video-based urban traffic crash detection task, aiming at achieving a balance between detection speed and accuracy with limited computing resource. In this framework, a residual neural network (ResNet) combined with attention modules was proposed to extract crash-related appearance features from urban traffic videos (i.e., a crash appearance feature extractor), which were further fed to a spatiotemporal feature fusion model, Conv-LSTM (Convolutional Long Short-Term Memory), to simultaneously capture appearance (static) and motion (dynamic) crash features. The proposed model was trained by a set of video clips covering 330 crash and 342 noncrash events. In general, the proposed model achieved an accuracy of 87.78% on the testing dataset and an acceptable detection speed (FPS > 30 with GTX 1060). Thanks to the attention module, the proposed model can capture the localized appearance features (e.g., vehicle damage and pedestrian fallen-off) of crashes better than conventional convolutional neural networks. The Conv-LSTM module outperformed conventional LSTM in terms of capturing motion features of crashes, such as the roadway congestion and pedestrians gathering after crashes. Compared to traditional motion-based crash detection model, the proposed model achieved higher detection accuracy. Moreover, it could detect crashes much faster than other feature fusion-based models (e.g., C3D). The results show that the proposed model is a promising video-based urban traffic crash detection algorithm that could be used in practice in the future.

Highlights

  • Traffic crashes can cause property damage, injuries, death, and nonrecurrent congestions

  • The attention module was combined with Residual Neural Network (ResNet) to capture the appearance features of the crash images. e ResNet can improve the speed of conventional convolution neural network, while the attention module can enable the model to focus on localized appearance features instead of other irrelevant information to further boost the model. en, the output feature map is reduced in dimension via a 1 × 1 convolutional layer, which is chronologically input into the Conv-LSTM network to further extract the motion features of crashes

  • A set of deep learning models was compared for differentiating crash images from noncrash images, with the purpose of finding a best crash appearance feature module, which was further linked to ConvLSTM

Read more

Summary

Introduction

Traffic crashes can cause property damage, injuries, death, and nonrecurrent congestions. Traditional crash/incident detection methods mostly rely on traffic flow modeling techniques [1,2,3,4,5,6,7]. E basic idea of traffic flow modeling is to identify nonrecurrent congestion, based on data from loop detectors, microwaves, and probe. Us, the performance of traffic flow modeling approach heavily depends on the data quality obtained from traffic detectors. It could often fail when the traffic environment is too complex (e.g., multimodal traffic in urban area). Us, detection accuracy of such method is sometimes not guaranteed Another emerging method is to identify incident based on crowdsourcing data [8]. With the development of intelligent transportation systems (ITS), video cameras have been widely installed in many cities and highways. anks to their wide coverage, vision-based crash detection techniques have gained increasing research attention in the recent years [9]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call