Abstract
The objective of this research is to develop a fast, simple method for detecting and replacing extreme spikes in high-frequency time series data. The method primarily consists  of a nonparametric procedure that pursues a balance between fidelity to observed data and smoothness. Furthermore, through examination of the absolute difference between original and smoothed values, the technique is also able to detect and, where necessary, replace outliers with less extreme data.
 Unlike other filtering procedures found in the literature, our method does not require a model to be specified for the data. Additionally, the filter makes only a single pass through the time series. Experiments  show that the new method can be validly used as a data preparation tool to ensure that time series modeling is supported by clean data, particularly in a complex context such as one with high-frequency data.
Highlights
An important topic in time series analysis is how to deal with data that consist of on-the-minute, hourly, daily or weekly observations
The paper is organized as follows: we present the normalized linear filter (NLF) together with computation of the thresholds beyond which outliers are detected
Brooks et al (1988) applied the generalized cross-validation (GCV) score suggested in Golub & Wahba (1979)
Summary
An important topic in time series analysis is how to deal with data that consist of on-the-minute, hourly, daily or weekly observations. Let pt ≥ 0 be the observed values at period t and n be the length of the time series pt, t = 1, 2, · · · , n. We detect extreme spikes (or outliers) by examining the absolute difference between observed values and the corresponding point in the reference curve. The function has two terms: goodness of fit and smoothness. () Fp measures fidelity to the data in terms of the squared deviations between smoothed and observed values. Fm is the maximum of F(p), which occurs when all m-th differences are equal to zero In this case, the reference curve is determined by fitting to p a polynomial of degree (m−1) by the lea(st)squares. 1, A very simple choice is λ = 0.5, which implies that fidelity and smoothness are balanced. The final section discusses our findings and points out some improvements for further applications
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Statistics and Probability
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.