Pdc: AnRPackage for Complexity-Based Clustering of Time Series

Andreas M Brandmaier

doi:10.18637/jss.v067.i05

Abstract

Permutation distribution clustering is a complexity-based approach to clustering time series. The dissimilarity of time series is formalized as the squared Hellinger distance between the permutation distribution of embedded time series. The resulting distance measure has linear time complexity, is invariant to phase and monotonic transformations, and robust to outliers. A probabilistic interpretation allows the determination of the number of significantly different clusters. An entropy-based heuristic relieves the user of the need to choose the parameters of the underlying time-delayed embedding manually and, thus, makes it possible to regard the approach as parameter-free. This approach is illustrated with examples on empirical data.

Highlights

Clustering is an unsupervised technique to partition a data set into groups of similar objects with the goal of discovering an inherent but latent structure
We examine a heuristic to automatically choose parameters of the required time-delayed embedding of the time series and a heuristic to determine the number of clusters in a hierarchical permutation distribution (PD) clustering
We introduced a dissimilarity measure between time series that is based on a complexity-based divergence of the time series

Summary

Introduction

Clustering is an unsupervised technique to partition a data set into groups of similar objects with the goal of discovering an inherent but latent structure. We use the squared Hellinger distance, a metric approximation of the Kullback-Leibler divergence (Kullback and Leibler 1951), between the distributions of signal permutations of embedded time series to obtain a dissimilarity matrix of a set of times series This serves as input for further clustering or projection. Addition and multiplication of positive constants to the time series do not change its PD, making the PD invariant to monotonic normalizations, e.g., standardization by subtracting the mean and dividing by standard deviation This property relieves the researcher from deciding to normalize as a preprocessing step which often has a serious impact on the clustering results when common metric dissimilarity measures are used, e.g., the Euclidean distance. We conclude with applications on simulated and real data, and a discussion of limitations and future work

Permutation distribution

Entropy heuristic

Heuristic to determine the number of clusters

Implementation

Clustering of autoregressive time series

A comparison to UCRTSA

Clustering EEG data

Shape complexity

Discussion

Findings

Permutation index

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Statistical Software	Publication Date: Jan 1, 2015
Citations: 53	License type: cc-by

R Discovery Prime

R Discovery Prime

Pdc: AnRPackage for Complexity-Based Clustering of Time Series

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software

Lead the way for us

Similar Papers

Adaptive Multiresolution and Dedicated Elastic Matching in Linear Time Complexity for Time Series Data Mining
Pierre-Francois Marteau ... Gildas Menier
-
Pierre-Francois Marteau, et. al.Pierre-Francois Marteau ... Gildas Menier
01 Oct 2006
01 Oct 2006

Linear Time Complexity Time Series Clustering with Symbolic Pattern Forest
Xiaosheng Li ... Jessica Lin
-
Xiaosheng Li, et. al.Xiaosheng Li ... Jessica Lin
01 Aug 2019
01 Aug 2019

Editor's evaluation: TMS-evoked responses are driven by recurrent large-scale network dynamics
Alex Fornito
-
Alex FornitoAlex Fornito
20 Oct 2022
20 Oct 2022

Author response: TMS-evoked responses are driven by recurrent large-scale network dynamics
Zheng Wang ... John D Griffiths
-
Zheng Wang, et. al.Zheng Wang ... John D Griffiths
23 Dec 2022
23 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pdc: AnRPackage for Complexity-Based Clustering of Time Series

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software