Impact of Starting Outlier Removal on Accuracy of Time Series Forecasting

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract The presence of an outlier at the starting point of a univariate time series negatively influences the forecasting accuracy. The starting outlier is effectively removed only by making it equal to the second time point value. The forecasting accuracy is significantly improved after the removal. The favorable impact of the starting outlier removal on the time series forecasting accuracy is strong. It is the least favorable for time series with exponential rising. In the worst case of a time series, on average only 7 % to 11 % forecasts after the starting outlier removal are worse than they would be without the removal.

Similar Papers
  • Conference Article
  • 10.31649/mccs2024.2-14
ПОРІВНЯЛЬНИЙ АНАЛІЗ МОДЕЛЕЙ ПРОГНОЗУВАННЯ ПОКАЗНИКА СТАНУ АТМОСФЕРНОГО ПОВІТРЯ
  • Nov 21, 2024
  • Dmytro Shmundiak

Periodic time series have many applications in our lives. Examples of periodic time series are air quality indicators, financial market indicators, meteorological parameters, etc. Because of this, the analysis and forecasting of periodic time series is a widespread and interesting scientific topic. One of the main problems in the analysis of periodic time series is the determination of the parameters of the seasonality of this series and the identification and elimination of abnormal values that can significantly affect the accuracy of data forecasting. In this work, a comparative analysis of the previously developed models and approaches for air quality indicator forecasting is given. The research is based on real data from the EcoCity public air quality monitoring network. A brief description of the method of identifying the seasonality parameters of the time series based on the decomposition of this time series and an approach to finding local anomalies in a time series based on the results of series decomposition is given. The results of the described models were used to forecast the PM2.5 dust index of one of the air quality monitoring stations in the Vinnytsia region. The Python programming language was used to automate the forecasting process, and the program code itself was implemented in the Kaggle system, a web platform from Google for machine learning engineers. The Prophet time series model was used for forecasting. A comparison table of the forecast accuracy of the Prophet model with default settings and with custom configuration based on the data from developed models and approaches was provided. The study and analysis showed that using both developed methods helps to reduce the forecasting error for the air quality indicator. Compared to the accuracy of the Prophet model with the default parameters, it was possible to reduce the MAE error value by 30% and the RMSE by 21%. This proves that these methods are effective for the analysis and forecasting of time series, including time series of air quality indicators.

  • Research Article
  • Cite Count Icon 1
  • 10.3233/ida-215811
Anomaly repair-based approach to improve time series forecasting
  • Mar 14, 2022
  • Intelligent Data Analysis
  • Thuy Huynh Thi Thu + 2 more

Time series forecasting has many practical applications in a variety of domains such as commerce, finance, medicine, weather, environment, and transportation. There exist so many methods developed for time series forecasting. However, most of the forecasting methods do not pay attention to anomalies in time series even though time series are sensitive to anomalies. Anomaly patterns cause negative effects on the accuracy of time series forecasting. In this paper, we propose a novel anomaly repair-based approach to improve time series forecasting in the case of anomaly existence. In our approach, an effective time series forecasting framework, EPL_S_X, is proposed with anomaly smoothing as a pre-processing stage and any existing time series prediction algorithm X. In particular, our proposed approach consists of three steps including detecting anomalies, repairing anomalies by using our smoothing method, and forecasting time series using preprocessed time series. Experimental results on several time series datasets reveal that our proposed approach improves remarkably the accuracy of many existing time series forecasting methods. It also outperforms the two robust time series forecasting methods that are based on exponential and Holt-Winters smoothing. With such better prediction performance, our approach is not only more effective but also more useful when dealing with anomalies in time series forecasting.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-981-99-0550-8_26
Time Series Analysis and Forecast Accuracy Comparison of Models Using RMSE–Artificial Neural Networks
  • Jan 1, 2023
  • Nama Deepak Chowdary + 3 more

Primary importance of our research paper is to demonstrate the time series analysis and forecast accuracy of different selected models based on neural networks. Fundamentally important to many practical applications is time series modeling and forecasting. As a result, there have been numerous ongoing research projects on this topic for many months. For enhancing the precision and efficacy of time series modeling and forecasting, numerous significant models have been put out in the literature. The purpose of this research is to give a brief overview of some common time series forecasting methods that are implemented, along with their key characteristics. The most frugal model is chosen with great care when fitting one to a data set of Pune precipitation data from 1965 to 2002. We have utilized the RMSE (root mean square error) as a performance index to assess forecast accuracy and to contrast several models that have been fitted to a time series. We applied feed-forward, time-lagged, seasonal neural networks, and long short-term memory models on selected dataset. The long short-term memory neural model worked better than other models.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/fuzzy.1999.793266
Fuzzy logic based automatic rule generation and forecasting of time series
  • Jan 1, 1999
  • A.K Palit + 1 more

An algorithm is proposed that automatically generates the fuzzy rules from time series data and can subsequently be used for forecasting of the same time series. The effectiveness of the algorithm, measured by the performance indices such as the sum squared error (SSE), root mean squared error (RMSE/MSE) and the mean absolute error (MAE), is demonstrated on forecasting of chaotic time series, as well as on forecasting of homogeneous non-stationary time series with and without seasonality and trend components.

  • Research Article
  • 10.33423/jsis.v14i3.2109
Forecast Precision and Forecast Accuracy from Moving Average and Moving Median Methods on Skewed Lognormal Time Series
  • Jul 18, 2019
  • Journal of Strategic Innovation and Sustainability
  • Louie Ren + 1 more

We examine the forecast precision and accuracy for forecasts from moving average and moving median methods on skewed i.i.d. time series following various lognormal probability distributions. Overall, we recommend the Moving Average method, MA, when forecasting time series that follow lognormal distributions.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-642-39869-8_1
Introduction: Intelligent Fashion Forecasting
  • Oct 26, 2013
  • Tsan-Ming Choi + 2 more

Forecasting is a classic topic of information systems [1] and it is a crucial part for companies in the fashion apparel industry. Despite the fact that there is no “perfect” forecast, forecasting for highly structured data (e.g., the time series with high seasonality or trend) is known to be “easy” because there are many well-established models which provide the needed analytical formulations [13, 18]. For example, Hott [1] develop analytical models with closed-form expressions for forecasting time series with prominent features of seasonality and trend by using the exponentially weighted moving average method. In addition, more sophisticated statistical methods such as SARIMA [2] and ARIMA have also been widely applied for these structured forecasting problems with good performance. However, for many real life applications in the fashion industry, the data patterns are notorious for being highly volatile and it is very difficult, if not impossible, to analytically learn about the underlining pattern and hence the well-established and traditional statistical methods will fail to make a sound prediction. As a result, recent advances of artificial intelligence (AI) technologies have provided the alternative way of providing precise and more accurate forecasting result for fashion sales time series. For example, Au et~al. [4] explore the fashion sales forecasting problem for fashion retailers by using evolutionary neural networks (ENN). They find that ENN can substantially enhance the forecasting accuracy compared to various other traditional methods. Although AI methods such as ENN can produce highly accurate forecasting results for volatile data sets, they suffer a major drawback in which they are slow (e.g., ENN can take hours in order to generate the forecasting results). This shortcoming becomes a major barricade which hinders the application of AI methods for forecasting in real world. Recently, in the literature, there are some innovative proposals and studies from different perspectives for establishing intelligent efficient forecasting systems with a focus on speed. Many of these proposed systems and models are inspiring and can lead to many promising applications. For example, El-Bakry and Mastorakis [5] propose an innovative approach which speeds up the prediction stage. To be specific, their method improves the forecasting speed by applying cross correlation between the whole input data and the weights of neural networks in the frequency domain. El-Bakry and Mastorakis prove analytically that this proposal can speed up the whole forecasting process and they call the resulting neural network a high speed neural network (HSNN) and discuss its use in time series forecasting. In Choi et~al. [2], in order to enhance the accuracy and versatility of SARIMA in conducting fashion sales forecasting, a novel hybrid approach by wavelet transform is developed. To be specific, Choi et~al. propose a scheme in which the original fashion sales time series is decomposed into components by wavelet transform. By conducting forecasting at the component level, the respective prediction results are obtained. Finally, in order to get the time series forecast for the original sales data, the component-level forecasts are transformed back to the original time series forecast. This hybrid wavelet transform SARIMA method has been tested with real and artificial data sets. Its performance is compared to both the pure statistical methods as well as some traditional AI methods, and is found to be satisfactory. Most recently, inspired by the strengths and weaknesses of pure statistical method (PSM) and the extended extreme learning machines (EELM), Yu et~al. [6] develop a novel algorithm which combines the EELM and the pure statistical model (PSM) to conduct intelligent fast forecasting for fashion sales time series. Their method has employed a sophisticated scheme to determine the optimal parameters of the algorithm which can achieve the best possible (expected) accuracy with EELM and PSM within the given time limit constraint. In addition to the pure statistical method, there are other newly emerged models which are fast and can yield comparable forecasting accuracies. For instance, the Grey Model (GM) [25] is one of such models. The GM has been employed in the study of fashion trend forecasting, and very favorable results have been reported in [22]. Such models like GM are also suitable candidates for modeling the fashion forecasting problems. Similarly, research work as in [24] would also require an algorithm which intelligently chooses between the models to accomplish efficient forecasting tasks.

  • Research Article
  • 10.3897/jucs.114357
Multi-Step-Ahead Time Series Forecasting using Deep Learning and Fuzzy Time Series-based Error Correction Method
  • Oct 28, 2024
  • JUCS - Journal of Universal Computer Science
  • Samit Bhanja + 2 more

Recently time series forecasting has become one of the prime application areas of climatology, economics and industries. Many research works are conducted to forecast the time series more accurately. But few of them are concentrated on predicting the time series over an extended future horizon, and there is also a scope to improve their forecasting accuracy. This work proposes a multi-step-ahead foresting method to produce a stable and accurate forecasting result for the extended future horizon. Firstly, a deep learning-based forecasting model is proposed to predict the time series. Secondly, a fuzzy time series-based error correction model is implemented to enhance the prediction performance of the deep learning model. Here to optimize all the fuzzy time series (FTS) parameters in an integrated way, an integrated butterfly optimization (FTS-IBO) algorithm is proposed. In this study, two different types of real-world multivariate time series datasets are used to analyze the forecasting performance of the proposed model. The performance of the proposed FTS-IBO algorithm is compared with the traditional butterfly optimization (FTS-BO) algorithm. The experimental results show that the FTS-IBO technique is superior to the FTS-BO technique. The forecasting performance of the proposed model has also compared the other state-of-the-art models, and the simulation results exhibit that the proposed model produces a more accurate prediction performance for multi-step-ahead time series forecasting problems compared to other models.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/itaic54216.2022.9836892
Single-Step Time Series Forecasting Based on Multilayer Attention and Recurrent Highway Networks
  • Jun 17, 2022
  • Fenggang Lai + 5 more

Multivariate time series forecasting plays an important role in many fields. However, due to the complex patterns of multivariate time series and the large amount of data, time series forecasting is still a challenging task. We propose a single-step forecasting method for time series based on multilayer attention and recurrent highway networks. Aiming at the shortcomings of the current traditional time series forecasting methods for extracting the temporal and spatial correlation characteristics of variables, as well as the two major problems existing in traditional recurrent neural networks, a single-step time series forecasting method is proposed to improving the accuracy of time series forecasting. This paper firstly defines the time series single-step forecast formally, then introduces the Attn-RHN (multilayer attention based recurrent highway networks) method in detail, and finally verifies the feasibility of our method on the corresponding data set.

  • Single Report
  • Cite Count Icon 38
  • 10.20955/wp.1984.022
On the Accuracy of Time Series, Interest Rate and Survey Forecasts of Inflation
  • Jan 1, 1984
  • R W Hafer + 1 more

There has been much work examining and evaluating different methodologies in forecasting inflation. For example, Fama (1975, 1977) develops an interest rate model to predict the 1-monthahead rate of inflation using the CPI. Fama and Gibbons (1982, 1984) modify the interest rate model to account for a nonconstant real rate of interest and compare it to a univariate timeseries forecasting model and to the Livingston forecasts. Carlson (1977a, 1977b) and others also analyze the consensus forecasts from the Livingston survey pertaining to multiperiod forecasts of inflation. Pearce (1979) considered the Livingston survey and time-series forecasts. Our purpose in this paper is to investigate further the relative forecasting ability of three different methodologies. In contrast to previous studies that have generally used the CPI to measure inflation, our analysis uses the implicit GNP deflator. This modification is motivated largely by the fact that calculations of the CPI inflat'n rate often are distorted greatly by changes in relative prices (e.g., Blinder 1980; Fischer 1981). This paper compares the accuracy of three different inflation forecasting procedures. These include a univariate time-series model, an interest rate model based on the methodology of Fama and Gibbons, and the median forecasts derived from the American Statistical Association-National Bureau of Economic Research survey. The evidence presented is based on ex ante forecasts of quarterly inflation rates using the GNP deflator for the period 1970:11984:11. Based on the evidence presented, the general conclusion is that the survey forecasts provide the most accurate inflation forecasts.

  • Research Article
  • Cite Count Icon 60
  • 10.1111/j.1467-8667.2009.00646.x
Artificial Neural Networks for Forecasting of Fuzzy Time Series
  • Feb 10, 2010
  • Computer-Aided Civil and Infrastructure Engineering
  • U Reuter + 1 more

: In this article, an artificial neural network for modeling and forecasting of fuzzy time series is presented. Modeling fuzzy time series with fuzzy data as random realizations of an underlying fuzzy random process enables forecasting of future fuzzy data following the observed time series. Analysis and forecasting of time series with fuzzy data may be carried out with the aid of artificial neural networks. A significant advantage is the fact that neural networks do not require a predetermined process model to simulate and forecast time series possessing fuzzy random characteristics. Artificial neural networks have the ability to learn the characteristics of an existing fuzzy time series, to represent the underlying fuzzy random process, and to forecast future fuzzy data following the time series observed. The algorithms developed are demonstrated using a numerical example.

  • Research Article
  • Cite Count Icon 190
  • 10.1016/j.engappai.2019.08.018
Hybrid structures in time series modeling and forecasting: A review
  • Sep 5, 2019
  • Engineering Applications of Artificial Intelligence
  • Zahra Hajirahimi + 1 more

Hybrid structures in time series modeling and forecasting: A review

  • Book Chapter
  • Cite Count Icon 51
  • 10.1016/b978-0-12-814761-0.00012-5
Chapter 12 - Time Series Forecasting
  • Jan 1, 2019
  • Data Science
  • Vijay Kotu + 1 more

Chapter 12 - Time Series Forecasting

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.apm.2018.08.017
A new method to mitigate data fluctuations for time series prediction
  • Aug 29, 2018
  • Applied Mathematical Modelling
  • Chong Li + 2 more

A new method to mitigate data fluctuations for time series prediction

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.knosys.2020.106467
Comparative study on the time series forecasting of web traffic based on statistical model and Generative Adversarial model
  • Oct 18, 2020
  • Knowledge-Based Systems
  • Kun Zhou + 3 more

Comparative study on the time series forecasting of web traffic based on statistical model and Generative Adversarial model

  • Research Article
  • Cite Count Icon 21
  • 10.1007/s11063-024-11656-3
Leveraging Hybrid Deep Learning Models for Enhanced Multivariate Time Series Forecasting
  • Aug 23, 2024
  • Neural Processing Letters
  • Amal Mahmoud + 1 more

Time series forecasting is crucial in various domains, ranging from finance and economics to weather prediction and supply chain management. Traditional statistical methods and machine learning models have been widely used for this task. However, they often face limitations in capturing complex temporal dependencies and handling multivariate time series data. In recent years, deep learning models have emerged as a promising solution for overcoming these limitations. This paper investigates how deep learning, specifically hybrid models, can enhance time series forecasting and address the shortcomings of traditional approaches. This dual capability handles intricate variable interdependencies and non-stationarities in multivariate forecasting. Our results show that the hybrid models achieved lower error rates and higher R2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$R^2$$\\end{document} values, signifying their superior predictive performance and generalization capabilities. These architectures effectively extract spatial features and temporal dynamics in multivariate time series by combining convolutional and recurrent modules. This study evaluates deep learning models, specifically hybrid architectures, for multivariate time series forecasting. On two real-world datasets - Traffic Volume and Air Quality - the TCN-BiLSTM model achieved the best overall performance. For Traffic Volume, the TCN-BiLSTM model achieved an R2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$R^2$$\\end{document} score of 0.976, and for Air Quality, it reached an R2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$R^2$$\\end{document} score of 0.94. These results highlight the model’s effectiveness in leveraging the strengths of Temporal Convolutional Networks (TCNs) for capturing multi-scale temporal patterns and Bidirectional Long Short-Term Memory (BiLSTMs) for retaining contextual information, thereby enhancing the accuracy of time series forecasting.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon
Setting-up Chat
Loading Interface