Abstract

In Active Traffic Management (ATM) system, realtime crash risk prediction is the basis for managing and improving traffic safety. The paper focuses upon the impact of training set on the cash risk prediction model and explores how to preprocess training set in order to improve the quality of the model in continuous data environment, based on the traffic data collected by loop detectors on 237 road segments of Shanghai Urban Expressway. The study operates full set of data, random undersampling data, and matched case-control data as the training set to build crash risk prediction models separately and compares the prediction effect of these models to determine a best sampling mechanism. In addition, we also apply the oversampling method SMOTE (Synthetic Minority Oversampling Technique) to the training set mentioned above to expand the crash data and then compared the prediction effects of models before and after expansion to explore if the oversampling method can improve the prediction accuracy of a real-time crash risk prediction model in continuous data environment. The commonly used logistic regression is used as the modeling algorithm, and the area under the ROC (Receiver Operating Characteristic) curve (AUC) value is employed as the evaluation index. The experimental results show that under the continuous data environment, the random undersampling data model has the best prediction effect with ratio of crash to non-crash 1:5, and the SMOTE method has little effect on the prediction accuracy improvement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.