With the objective of predicting driving risks on urban arterial and collector roads, this paper primarily collects and processes vehicle trajectory data and driver-vehicle-road-environment data. Using Entropy-Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) and K-means, risk scores and levels for the target roads are determined. An original dataset comprising 10,320 samples is established, with risk levels as labels and multi-dimensional data as features. SHapley Additive exPlanations (SHAP) analysis is utilized to extract important features. The performance of four ensemble models are compared, including Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and Stacking. Three data resampling techniques are integrated to address data imbalance. The combination of LightGBM and Deep Convolutional Generative Adversarial Network (DCGAN) outperforms other approaches, showing potential for real-time risk prediction. Besides, DCGAN is employed to update the multi-dimensional features. The models' predictive performance is evaluated under various update ratios using F1-score and training time, with LightGBM demonstrating superior accuracy and efficiency compared to its counterparts (RF, XGBoost, and Stacking). This study develops a driving risk prediction system based on multi-dimensional features, offering a theoretical foundation and technical approach to enhance urban road traffic safety management.
Read full abstract