Practical Minimum Sample Size for Road Crash Time-Series Prediction Models

Fady M A Hassouna,Khaled Al-Sahili,Valeria Vignali

doi:10.1155/2020/6672612

Abstract

Road crashes are problems facing the transportation sector. Crash data in many countries are available only for the past 10 to 20 years, which makes it difficult to determine whether the data are sufficient to establish reasonable and accurate prediction rates. In this study, the effect of sample size (number of years used to develop a prediction model) on the crash prediction accuracy using Autoregressive integrated moving average (ARIMA) method was investigated using crash data for years 1971–2015. Based on the availability of annual crash records, road crash data for four selected countries (Denmark, Turkey, Germany, and Israel) were used to develop the crash prediction models based on different sample sizes (45, 35, 25, and 15 years). Then, crash data for 2016 and 2017 were used to verify the accuracy of the developed models. Furthermore, crash data for Palestine were used to test the validity of the results. The used data included fatality, injury, and property damage crashes. The results showed similar trends in the models’ prediction accuracy for all four countries when predicting road crashes for year 2016. Decreasing the sample sizes led to less prediction accuracy up to a sample size of 25; then, the accuracy increased for the 15-year sample size. Whereas there was no specific trend in the prediction accuracy for year 2017, a higher range of prediction error was also obtained. It is concluded that the prediction accuracy would vary based on the varying socioeconomic, traffic safety programs and development conditions of the country over the study years. For countries with steady and stable conditions, modeling using larger sample sizes would yield higher accuracy models with higher prediction capabilities. As for countries with less steady and stable conditions, modeling using smaller sample sizes (15 years, for example) would lead to high accuracy models with good prediction capabilities. Therefore, it is recommended that the socioeconomic and traffic safety program status of the country is considered before selecting the practical minimum sample size that would give an acceptable prediction accuracy, therefore saving efforts and time spent in collecting data (more is not always better). Moreover, based on the data analysis results, long-term ARIMA prediction models should be used with caution.

Highlights

Time-series analysis is a common technique used by numerous research studies to analyze trends of certain phenomena and to predict future conditions. is technique has been applied in various fields: engineering, scientific, social, medical, etc
Autoregressive integrated moving average (ARIMA) model, Zimbabwe a white noise process is difficult because the values at different times are statistically independent e trends and patterns of road crashes were studied, ARIMA model, Ghana and a five-year prediction was made. e study showed an increasing trend for the coming five years
During the last three decades, an important question has arisen, “What is the reasonable minimum sample size for appropriate time-series modeling?” there is a common rule-of-thumb in statistics that more is better

Summary

Introduction

Time-series analysis is a common technique used by numerous research studies to analyze trends of certain phenomena and to predict future conditions. is technique has been applied in various fields: engineering, scientific, social, medical, etc. A scan through the web showed many researchers asking, “What is the reasonable minimum sample size for appropriate modeling?” It is a common rule-of-thumb in statistics that more is better; for time-series analysis. Over the past fifty years, which is the recommended minimum sample size on a yearly basis for ARIMA models, several countries moved up from being “underdeveloped” to the level of “developed” country These changes greatly affect the road crash patterns, in a way that ARIMA models, or other time-series models, might not be able to properly work. E issue at hand in this paper is not to question the statistical requirements of the minimum sample size for modeling time-series and forecasting using ARIMA models; this is left for statisticians It is rather, and for road crash analysis, what the “reasonable” and “practical” sample size would be for modeling time-series and forecasting using the Box and Jenkins ARIMA models with “appropriate” confidence limits. E paper answers this question by modeling road crashes from four countries, testing different sample sizes for each country, and assessing their prediction capabilities and the associated significance levels for each country, using the ARIMA models. e results were verified using road crash for Palestine

Literature Review

Data Analysis and Discussion

Findings

Case of Palestine

Conclusions