Abstract

Outlier analysis is very often the first step in data pre-processing. Since it is performed on mostly raw data, it is crucial that algorithms used are fast and reliable. These factors are hard to achieve when the data analysed is highly dimensional, such is the case with multiple time series data sets. In this article, various outlier detection methods (distance distribution-based methods, angle-based methods, k-nearest neighbour, local density analysis) for numerical data are presented and adapted to multiple time series data. The study also addresses the problem of choosing an appropriate similarity measure (L-p norms, Dynamic Time Warping, Edit Distance, Threshold Queries based Similarity) and its impact on efficiency in further analysis. Work has also been put into determining the impact of an approach to apply these measures to multivariate time series data. To compare the different approaches, a set of tests were performed on synthetic and real data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call