Abstract
Traffic collisions affect millions around the world and are the leading cause of death for children and young adults. Thus, Canada’s road safety plan is to reduce collision injuries and fatalities with a vision of making the safest roads in the world. We aim to predict fatalities of collisions on Canadian roads, and to discover causation of fatalities through exploratory data analysis and machine learning techniques. We analyse the vehicle collisions from Canada’s National Collision Database (1999–2017.) Through data mining methodologies, we investigate association rules and key contributing factors that lead to fatalities. Then, we propose two supervised learning classification models, Lasso Regression and XGBoost, to predict fatalities. Our analysis shows the deadliness of head-on collisions, especially in non-intersection areas with lacking traffic control systems. We also reveal that most collision fatalities occur in non-extreme weather and road conditions. Our prediction models show that the best classifier of fatalities is XGBoost with 83% accuracy. Its most important features are “collision configuration” and “used safety devices” elements, outnumbering attributes such as vehicle year, collision time, age, or sex of the individual. Our exploratory and predictive analysis reveal the importance of road design and traffic safety education.
Highlights
Each year, traffic collisions kill approximately 1.35 million people around the world and are the leading cause of death for children and young adults [1]
In our initial analysis of the 19 years of data, from 1999 to 2017, we found that 98.41% of the collisions have resulted in no fatality, 1.59% have resulted in at least one fatality
Some collisions that result in at least one fatality have a mutual trait which is seen in every single rule, that is, head-on collisions
Summary
Traffic collisions kill approximately 1.35 million people around the world and are the leading cause of death for children and young adults [1]. In 2017, Canada’s number of motor vehicle fatalities and injuries reached 1,841 and 154,886, respectively [2]. In Japan [9], the vehicle-to-pedestrian-accidents data is used to predict the seriously injured body regions of pedestrians by considering various factors including the accident year, vehicle type, travel speed, and pedestrian gender and age. The research in [10] used the data on road accidents with heavygoods vehicles and buses for 27 European Union countries over 10 years and analysed safety parameters, such as area type, the season of the year, the weekday, casualty age and gender. Colombia [11], Taiwan [12], and Serbia [13] separately examined spatial features such as road geometry and precipitation, as well as temporal attributes such as hour and day of the week, whereas India [14] analysed individuals’ characteristics such as driving patterns and drunk driving to describe the traffic accidents and casualties
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have