Abstract

Nowadays, as traffic accidents keep happening, traffic safety has become a major focus of contemporary social issues. Many factors account for traffic accidents, such as accident location, time period, driver’s feelings, weather and other uncertain complex factors. As a result, the occurrence of traffic accidents is nonlinear, so it is necessary to explore the correlation between the data from many different aspects so as to avoid risks. By analyzing traffic data and graphics, R language shows how the data is related. After data preprocess, data selection by using R language Remap package remapB and remapH function, we get the locations of the accidents and the accident thermal chart, where you can find high- frequency accident locations. Besides, we employ decision tree, linear regression, random forest algorithm to model the data. According to the actual results, we can verify the correctness of the model and get the most accurate model and it can help us to predict this model with similar data in the future. The ultimate goal of data analysis is to choose the most accurate model after validating the model, analyzing the characteristics of the data and the relationship between the model and the data.

Highlights

  • 1.1 Research BackgroundAt present, China’s national economy develops rapidly

  • The text-type accident location in the original data is merged with the geographical coordinates of the site, that is latitude and longitude, it becomes a new data frame, which is used for the database

  • The forest consists of many decision trees, there is no association between each decision tree in the random forest

Read more

Summary

Research Background

China’s national economy develops rapidly. Motor vehicle ownership, driving numbers, and road traffic flow continued to rise. The traffic safety problem has become a key factor which can influence lives and property’s safety of people, affecting and restricting the benefits of social and economic development. The development and progress of road traffic have brought great convenience to human society, economic benefits and social prosperity. In 2011, after the ban on drunk driving, the ownership of country car was 78 million, road traffic accidents were 210,812, the death toll as high as 62387. Comparing with Japan, the ownership of country car was more than 7,000, traffic accidents up to three times in China, while the death toll was only 4611 people. There is still a huge gap between China and developed countries in traffic safety. For the data processing requirements quickly, high traffic accident data timeliness, and let the data more accurate, we need to process time efficiently. The prediction results of each model are compared with the actual data results, and the confusion matrix is given to compare the accuracy of each model with the kappa value (the fit between the observers)

Research Purposes
Related research
Data Preprocessing
Geographic Location is Converted to Latitude and Longitude
Numerical Analysis
Consolidate Data
Draw the Accident Map
Draw the Accident Heat Map
Modeling Prediction
Random Forest
Bagging Decision Tree
Experiment Analysis
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call