Abstract

The objective of this research is to develop a fast, simple method for detecting and replacing extreme spikes in high-frequency time series data. The method primarily consists  of a nonparametric procedure that pursues a balance between fidelity to observed data and smoothness. Furthermore, through examination of the absolute difference between original and smoothed values, the technique is also able to detect and, where necessary, replace outliers with less extreme data.
 Unlike other filtering procedures found in the literature, our method does not require a model to be specified for the data. Additionally, the filter makes only a single pass through the time series. Experiments  show that the new method can be validly used as a data preparation tool to ensure that time series modeling is supported by clean data, particularly in a complex context such as one with high-frequency data.

Highlights

  • An important topic in time series analysis is how to deal with data that consist of on-the-minute, hourly, daily or weekly observations

  • The paper is organized as follows: we present the normalized linear filter (NLF) together with computation of the thresholds beyond which outliers are detected

  • Brooks et al (1988) applied the generalized cross-validation (GCV) score suggested in Golub & Wahba (1979)

Read more

Summary

Introduction

An important topic in time series analysis is how to deal with data that consist of on-the-minute, hourly, daily or weekly observations. Let pt ≥ 0 be the observed values at period t and n be the length of the time series pt, t = 1, 2, · · · , n. We detect extreme spikes (or outliers) by examining the absolute difference between observed values and the corresponding point in the reference curve. The function has two terms: goodness of fit and smoothness. () Fp measures fidelity to the data in terms of the squared deviations between smoothed and observed values. Fm is the maximum of F(p), which occurs when all m-th differences are equal to zero In this case, the reference curve is determined by fitting to p a polynomial of degree (m−1) by the lea(st)squares. 1, A very simple choice is λ = 0.5, which implies that fidelity and smoothness are balanced. The final section discusses our findings and points out some improvements for further applications

Optimal Smoothing
Choice of the Smoothing Constant
Detection of Extreme Spikes
Segmentation
Monte Carlo Analysis
SARIMA Models
Effects of Smoothing on Point Forecast Accuracy
Simulation Design
Findings
Conclusions and Future Research
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call