Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms

Nicholas Fiorentini,Massimo Losa

doi:10.3390/infrastructures5070061

Abstract

Crash severity is undoubtedly a fundamental aspect of a crash event. Although machine learning algorithms for predicting crash severity have recently gained interest by the academic community, there is a significant trend towards neglecting the fact that crash datasets are acutely imbalanced. Overlooking this fact generally leads to weak classifiers for predicting the minority class (crashes with higher severity). In this paper, in order to handle imbalanced accident datasets and provide a better prediction for the minority class, the random undersampling the majority class (RUMC) technique is used. By employing an imbalanced and a RUMC-based balanced training set, we propose the calibration, validation, and evaluation of four different crash severity predictive models, including random tree, k-nearest neighbor, logistic regression, and random forest. Accuracy, true positive rate (recall), false positive rate, true negative rate, precision, F1-score, and the confusion matrix have been calculated to assess the performance. Outcomes show that RUMC-based models provide an enhancement in the reliability of the classifiers for detecting fatal crashes and those causing injury. Indeed, in imbalanced models, the true positive rate for predicting fatal crashes and those causing injury spans from 0% (logistic regression) to 18.3% (k-nearest neighbor), while for the RUMC-based models, it spans from 52.5% (RUMC-based logistic regression) to 57.2% (RUMC-based k-nearest neighbor). Organizations and decision-makers could make use of RUMC and machine learning algorithms in predicting the severity of a crash occurrence, managing the present, and planning the future of their works.

Highlights

The latest 2018 report of the World Health Organization states that more than 1.35 million people die each year from causes related to road accidents [1]
True Positive Rate (TPR) and recall have the same meaning, we report both since TPR is usually presented along with False Positive Rate (FPR), while precision is commonly accompanied by recall
Using a balanced training set, we note a decrease in performance for predicting Property Damage Only (PDO) class and a significant increase in performance related to the F + I class

Summary

Introduction

The latest 2018 report of the World Health Organization states that more than 1.35 million people die each year from causes related to road accidents [1]. It declared that road accidents are the leading cause of death for children and young people aged between 5 and 29 years. These statements push us to research and improve processes aimed at enhancing the road safety level of infrastructures, moderating the number of accidents, and evaluating the key factors that are the cause or contributing factors to an accident. Learning from datasets that include occasional events usually provides biased classifiers: they have higher predictive accuracy over the majority class, but weaker predictive capacities over the minority class [2,3,4]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Infrastructures	Publication Date: Jul 20, 2020
Citations: 86	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Infrastructures

Lead the way for us

Similar Papers

A novel one-vs-rest consensus learning method for crash severity prediction
Syed Fawad Hussain ... Muhammad Mansoor Ashraf
Expert Systems With Applications | VOL. 228
Syed Fawad Hussain, et. al.Syed Fawad Hussain ... Muhammad Mansoor Ashraf
11 May 2023
Expert Systems With Applications | VOL. 228

Artificial Intelligence-related Literature in Transplantation: A Practical Guide.
Sook Hyeon Park ... Sanjay Mehrotra
Transplantation | VOL. 105
Sook Hyeon Park, et. al.Sook Hyeon Park ... Sanjay Mehrotra
18 Aug 2020
Transplantation | VOL. 105

Analysing the Severity and Frequency of Traffic Crashes in Riyadh City Using Statistical Models
Saleh Altwaijri ... Mohammed Quddus
International Journal of Transportation Science and Technology | VOL. 1
Saleh Altwaijri, et. al.Saleh Altwaijri ... Mohammed Quddus
01 Dec 2012
International Journal of Transportation Science and Technology | VOL. 1

Prediction model of crash severity in imbalanced dataset using data leveling methods and metaheuristic optimization algorithms
Akbar Danesh ... Hamzeh Zakeri
International Journal of Crashworthiness | VOL. ahead-of-print
Akbar Danesh, et. al.Akbar Danesh ... Hamzeh Zakeri
12 Jan 2022
International Journal of Crashworthiness | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Infrastructures