Abstract

The automated enforcement system (AES) is an effective way of supplementing traditional traffic enforcement, and the traffic violation data from AES can also be effectively used for safety research. In this study, traffic violation data were used to analyze the influencing factors associated with traffic violations and to predict the probability of violations at intersections. The potential factors influencing violations include 24 independent factors related to time, space, traffic and weather. Results from a logistic model showed that the midday period, weekends, residential districts, collector roads, congested traffic conditions, high traffic flow, lower wind speed and low temperature would increase the probability of traffic violations. The probability of violations was predicted by the random forest algorithm, which was proven to be the best traffic violation prediction model among logistic regression, Gaussian naive Bayes, and support vector machine. Moreover, the proximity weighted synthetic oversampling technique (ProWSyn) method was applied to reduce the impact of the imbalance ratio (IR) and improve the model’s prediction performance. The receiver operating characteristics (ROC) curves and Precision-Recall (PR) curves illustrated that the random forest algorithm using oversampling data had the best classifier prediction performance than undersampling data. The area under curve (AUC) and out-of-bag (OOB) error with IR = 1 reached 0.914 and 0.0787, which showed the better performance of the random forest algorithm using ProWSyn in dealing with imbalanced traffic violation data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call