Among many traffic forecasting studies, comparatively fewer studies focus on long-term traffic prediction, such as 24-hour prediction. While traffic data such as traffic speed are easier to obtain, obtaining similarly reliable and accessible feature data with the inclusion of weather or events would be difficult depending on the location or availability of the service providers. Getting these data becomes a more significant issue when considering global coverage. To mitigate the issue of limited feature data, a method to augment already existing data by improving the dataset's quality and ensuring more accurate training via sorting the dataset into appropriate clusters to be used as an additional feature is proposed. This paper proposes a long-term traffic forecasting model that utilizes a novel time-series segmentation method paired with data clustering and classification via Convolutional Neural Network (CNN) to cover the lack of traffic data and features as additional pre-processing before using Long Short-Term Memory (LSTM) for long-term traffic prediction which is not researched as much. This proposed model is called Cluster Augmented LSTM (CAL). The proposed model is compared with existing machine learning models and evaluated using Mean Absolute Percentage Error (MAPE) and Root-Mean-Squared-Error (RMSE) performance metrics. A comparison between LSTM and Gated Recurrent Units (GRU) was conducted, showing that GRU tends to outperform LSTM in most cases. However, the best-performing result for the proposed method still utilizes LSTM. The final results show that the proposed CAL model could achieve better results by 1.42 %-1.76 % and 0.25–0.41 for MAPE and RMSE, respectively.
Read full abstract