Abstract

The mining of time series data plays an important role in modern information retrieval and monitoring infrastructures. In particular, the identification of similarities within and across large time series is of great importance in analytics and knowledge discovery. For this task, the matrix profile similarity indexing approach, which encodes the correlations among snapshots of a time series, is well-established. However, it is computationally expensive, especially for long time series, as existing exact approaches mostly rely on exhaustive, exact query (search) operations and are inefficient. Similarly, existing approximate approaches are limited with respect to parallelism, scalability, or their extent of practicality. We, therefore, focus on an approximate parallel tree-based nearest-neighbors approach and address the weak scaling challenges raised when applied to large time series in HPC settings. We build on the existing concept of parallel iterative tree-based nearest neighbor solvers and introduce a novel approach for the approximate calculation of the matrix profile. To improve the performance and overcome weak scalability challenges, we exploit a mix of creating a forest of parallel trees on exclusive ensembles of resources combined with pipelining of iterations. We provide an implementation targeting large-scale CPU-based HPC systems and illustrate the performance of this new approach with experimental data. Finally, we demonstrate the mining of time series at billion-records-scale datasets on the SuperMUC-NG system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.