Efficient Top-k Similarity Join of Massive Time Series Using MapReduce

Dehua Chen ,Changgan Shen ,Yue Li ,Jiajin Le ,Chunming Rong

doi:10.6138/jit.2014.15.6.13

Abstract

Top-k similarity join of time series, designed to find top-k most similar pairs of time series records, is a primitive operation widely adopted by many time series data analysis applications. However, computing such top-k similarity join is a challenging problem today, as many modern applications are creating massive amounts of time series data. Obviously, a centralized machine is difficult to perform top-k similarity join in a large time series database efficiently. In this paper, we investigate how to perform the top-k similarity join of massive time series in parallel using MapReduce over a large cluster of commodity machines. Our proposed MapReduce-based algorithm consists of four steps, which takes as input a set of time series records and output an ordered list of top k closest pairs. To improve the efficiency in computing top-k similarity join, we proposed several solutions. We first introduce an efficient distance function based on LSH (Locality Sensitive Hash) for time series to improve the efficiency in pairwise similarity comparison. We next propose all pair partitioning methods to minimize the amount of data transfers between map and reduce functions. Moreover, we make use of serial computation strategy for parallelizing the computation of local top-k closest pairs in each partition. Our performance study confirms the effectiveness and scalability of our MapReduce algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Top-k Similarity Join of Massive Time Series Using MapReduce

Abstract

Talk to us

Similar Papers

More From: Journal of Internet Technology

Lead the way for us

Similar Papers

TSAaaS: Time Series Analytics as a Service on IoT
Xiaomin Xu ... Sheng Huang
-
Xiaomin Xu, et. al.Xiaomin Xu ... Sheng Huang
01 Jun 2014
01 Jun 2014

Apache IoTDB
Chen Wang ... Jinrui Zhang
Proceedings of the VLDB Endowment | VOL. 13
Chen Wang, et. al.Chen Wang ... Jinrui Zhang
01 Aug 2020
Proceedings of the VLDB Endowment | VOL. 13

Identification of Traffic Index Time Series Pattern by Using Convolution Neural Network
...
-
, et. al. ...
05 Dec 2020
05 Dec 2020

Determining the Impact of Different Forms of Stationarity on Financial Time Series Analysis
Jan Van Greunen ... André Heymans
-
Jan Van Greunen, et. al.Jan Van Greunen ... André Heymans
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Top-k Similarity Join of Massive Time Series Using MapReduce

Abstract

Talk to us

Similar Papers

More From: Journal of Internet Technology