Forecasting time series has acquired immense research importance and has vast applications in the area of air pollution monitoring. This work attempts to investigate the abilities of various existing techniques when applied for short term, high granular time series forecasting of PM2.5. More specifically, a comparative study has been provided, taking into account both popularly used models and lesser-used models in this area. The study has been carried out considering ten well defined models that are ARIMA (auto-regressive integrated moving average), SARIMA (seasonal ARIMA), SES (single exponential smoothing), DES (double exponential smoothing), TES (triple exponential smoothing), ANN (artificial neural network), DT (decision tree), kNN (k-nearest neighbor), LSTM (long short-term memory) and MCFO (markov chain first order). A framework has been built that categories the models, implements them under identical execution environment and forecasts succeeding values. Implementation has been carried out over five data sets of real-world air pollution time series, that are collected from five differently located government setup monitoring stations over a period of 1 year (July 2018-June 2019). Rigorous statistical analysis has been performed that yields an insight to the nature and variability of these time series data. Forecasting has been carried out on short term basis, focusing on high granularity whereas, three different lengths of forecast horizon (1 day, 1 week, and 1 month) have been tested. Eventually, the models have been compared in terms of their associated performance measuring units namely, RMSE (root mean of squared error), MAE (mean absolute error) and MAPE (mean absolute percentage error). The comparative results verified with multiple datasets show that all the models posses less error for a shorter forecast horizon, where LSTM providing the best performance. Superiority of machine learning and deep learning models are found in case of longer length of forecast horizon with kNN achieving best accuracy whereas, significant performance degradation of ARIMA is found for longer forecast horizon. Moreover, TES, DT, kNN, LSTM, MCFO are found to be well adopted in relation with shape and variability of the data. Note that the performance on various length of high granular forecast horizon have been studied over multiple datasets that give an added value to this work.
Read full abstract