A Data-Driven Study to Investigate the Causes of Severity of Road Accidents

Arindam Kumar Paul,Pritam Khan Boni,Md Zahidul Islam

doi:10.1109/icccnt54827.2022.9984499

Abstract

Traffic safety is one of the most concerning issues worldwide now a day.Thousands of life are being lost every day only for road accidents in each and every country and geographical region. Road accidents have increased with rapid development of the transportation sector. Therefore every government has considered it as one of the vital problems of this time and started taking steps for reducing the road accident and damage caused by the road accidents. A huge amount of data related to road accidents and traffic safety are presently available from Government sources. Analysing that massive amount of statistical data for investigating the contributory factors behind road accidents is impractical and time-consuming too. In this study we have used some mathematical and statistical tools together for discovering the most responsible causes of road accident severity. Since the machine learning and data science have been developed very rapidly for investigating black box systems and performing a great role in predicting outcomes very successfully in most of the cases, we have used three data mining algorithms which is well-known as feature selection algorithms in the context of machine learning for investigating most contributory factors behind road crashes severity on a large data set consisting of output variables as road accident severity and 15 inputs or predictor variables for investigating the influence of each independent predictor on the input variable. We have used neighborhood component analysis, K nearest neighbours based supervised learning algorithm, partial dependence plot and individual conditional expectation based feature selection algorithm for investigating the most contributory factors behind road accidents severity.After implementing those algorithms on the data set we have plotted the importance of factors for visualising are result and we have placed a table for showing or result numerically of each individual algorithm. Finally for validating and testing our findings we have used support vector machine classification with all 15 variables and most influential eight variables according to our findings. By comparing the results of two support vector machine classification model we have shown the validity and justified or study in a proper way.

Full Text