Correlation Joins over Time Series Data Streams Utilizing Complementary Dimension Reduction and Transformation

AmirReza Alizade Nikoo,Michael H. Böhlen,Sven Helmer

doi:10.1145/3626722

AmirReza Alizade Nikoo, Michael H. Böhlen + Show 1 more

Open Access

https://doi.org/10.1145/3626722

Copy DOI

Abstract

A common analysis task over a stream of time series is to find all pairs of windows whose correlation is above a given threshold. For a large number of streams, doing so naively, i.e., checking the Cartesian product, is too expensive. In essence, finding correlated pairs in a non-naive way boils down to a high-dimensional similarity join in a Euclidean space. While there are similarity join algorithms, such as Quickjoin and ε-kdB tree, they are inefficient for high-dimensional data. We propose CorrJoin, short for Correlation Join, that combines a complementary dimension reduction and transformation step with a subsequent double-filtering step. In the first step, we reduce the dimensionality of data stream windows by combining a fast but inaccurate method, Piecewise Aggregate Approximation (PAA), with an accurate and slow one, Singular Value Decomposition (SVD). Not only does SVD compensate for the weaknesses of PAA, it also transforms the data to make the first filter based on bucketing more effective. The second filter, which uses Euclidean distances, reduces the number of false positives before computing exact correlations. Our experiments reveal that in common settings, CorrJoin is an order of magnitude faster than state-of-the-art approaches (up to 20 times faster than Quickjoin).

Full Text