The escalating need for proactive safety measures, coupled with advancements in data collection and analytical techniques, has significantly refined the accuracy of crash count predictions, shifting from annual scales to finer daily or hourly estimates. This research places emphasis on recurrent neural networks (RNNs), specifically the long short-term memory (LSTM) model, acknowledged for effectively managing sequential data in time series predictions. Paramount considerations encompass the treatment of input data, including the decision to incorporate temporal features alongside endogenous historical target values, and the establishment of an optimal window size for data input. Despite the critical nature of these facets, exhaustive studies concurrently investigating both under controlled conditions are scarce. This research addresses this gap, assessing diverse scenarios featuring distinct temporal treatments and window sizes, and employing an LSTM model with uniform fine-tuned parameters to ensure a fair comparison. Findings indicate a significant variation in performance among models employing different window sizes and month predictor integration, under identical RNN structures and LSTM configurations. The best model outperformed the least effective by roughly 30% concerning root mean square error (RMSE) and ranking correlation between predicted and actual target crash counts in the test datasets. Unlike seasonality predictor treatment, diverse window size selections did not lead to statistically significant differences in model performance. Generally, a window size of 3 performed worst under most conditions, followed by a size of 14. Sizes of 7 and 28 performed best in nearly an equal numbers of cases.
Read full abstract