Abstract

The purpose of this paper is to present statistical methods and models we used to find out factors that caused fatal car crashes and high damage cost. The benefit of our project is that the Virginia DMV can make some adjustments accordingly and reduce the number of crashes that are fatal and have high damage cost. The data we used is between 2010 and 2014 for both fatality analysis and damage cost analysis. Data of 2015 was used for fatality analysis only. In the first part of this paper, we will introduce how we find factors that caused fatal car crashes. Since the data are unbalanced, we first subsampled the non-fatal crashes and applied a higher weight for fatal crashes. When building the model, we used logistic regression model to predict whether an accident is fatal or not. To select features that are more important, we used factors that are all numeric and with correlation value more than 0.1. We obtained a recall of 40% in the prediction from the logistic regression. We also adopted Decision Tree in fatality analysis and built two models for 2010–2014 data as well as 2015 data. In the second part of this paper, we will discuss how we find factors that caused damage cost. Since values of damage cost variable are unbalanced, we proposed a two-state method to find critical factors of the damage cost. First, we used K nearest neighborhood (KNN) to predict whether the damage cost is 0 or not. Second, we built Lasso Regression on the data where the damage cost were not zero and discovered the factors that lead to the damage cost.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.