Abstract

Dynamic Time Warping (DTW) is a widely used distance measurement in time series clustering. DTW distance is invariant to time series phase perturbations but has a quadratic complexity. An effective acceleration method must reduce the DTW utilization ratio during time series clustering; for example, TADPole uses both upper and lower bounds to prune off a large ratio of expensive DTW calculations. To further reduce the DTW utilization ratio, we find that the linear-complexity L1-norm distance (Manhattan distance) is effective enough when the time series only comprise small phase perturbations. Therefore, we propose a novel time series clustering by Minimizing Dynamic Time Warping Utilization (MiniDTW) algorithm to accelerate time series clustering. In MiniDTW, the dataset is first greedily summarized into seed clusters, which comprise time series of small phase perturbations, by L1-norm distance. Then, we develop a new Sparse Symmetric Non-negative Matrix Factorization (SSNMF) algorithm, which factorizes the DTW distance matrix of seed cluster centers, to merge the seed clusters into the final clusters. The experiments on UCR time series datasets demonstrate that MiniDTW, pruning 98.52% of the DTW utilization, is better than the counterpart method, TADPole, which only prunes 75.56% of the DTW utilization; and thus MiniDTW is 10 times faster than TADPole.

Highlights

  • Time series is one of the most important data in the modern data-driven society and can be generated from nearly every aspects in the daily life [1]

  • We propose a novel time series clustering by Minimizing Dynamic Time Warping Utilization (MiniDTW) algorithm to accelerate time series clustering

  • Since MiniDTW is proposed to accelerate time series clustering by reducing the DTW utilization ratio, TADPole [5] is the counterpart method most related to ours because it aims at accelerating time series clustering by pruning a fraction of DTW distance use based on faster DTW upper/lower (L1-norm/LB_Keogh [35]) bounds

Read more

Summary

INTRODUCTION

Time series is one of the most important data in the modern data-driven society and can be generated from nearly every aspects in the daily life [1]. Time series clustering is a basic technique for analyzing time series It can discover the underlying structure of the chaotic/raw datasets without the ground truth labels. To accelerate time series clustering with DTW distance, some methods reduce the DTW utilization ratio by pruning unnecessary DTW calculations with fast calculated upper/lower bounds of DTW, such as TADPole [5]. To significantly reduce the DTW utilization ratio for the acceleration, we only apply the complex DTW calculation on a summarized time series dataset (rather than the original dataset). To ‘‘greedily’’ reduce the DTW utilization ratio, we summarize the dataset into natural-shaped seed clusters with L1-norm distance. In MiniDTW, the original dataset is first ‘‘greedily’’ summarized as a small amount of natural-shaped seed clusters with the efficient L1-norm distance. MiniDTW minimizes DTW utilization ratio by dataset summarization with the linear-complexity L1-norm distance.

RELATED WORK
L1-NORM DISTANCE AND DTW DISTANCE
PROBLEM DEFINITION
THE PROPOSED METHOD
DATASET SUMMARIZATION WITH L1-NORM DISTANCE
MERGE THE TIME SERIES SEED CLUSTERS
9: Initialize clusters as K empty sets
EVALUATION
EXPERIMENT SETUP
ACCURACY ANALYSIS
EFFICIENCY ANALYSIS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call