Similarity-Based Outlier Detection in Multiple Time Series

Grzegorz Gołaszewski

doi:10.1007/978-3-030-18058-4_10

Abstract

Outlier analysis is very often the first step in data pre-processing. Since it is performed on mostly raw data, it is crucial that algorithms used are fast and reliable. These factors are hard to achieve when the data analysed is highly dimensional, such is the case with multiple time series data sets. In this article, various outlier detection methods (distance distribution-based methods, angle-based methods, k-nearest neighbour, local density analysis) for numerical data are presented and adapted to multiple time series data. The study also addresses the problem of choosing an appropriate similarity measure (L-p norms, Dynamic Time Warping, Edit Distance, Threshold Queries based Similarity) and its impact on efficiency in further analysis. Work has also been put into determining the impact of an approach to apply these measures to multivariate time series data. To compare the different approaches, a set of tests were performed on synthetic and real data.

Full Text