Abstract

Permutation distribution clustering is a complexity-based approach to clustering time series. The dissimilarity of time series is formalized as the squared Hellinger distance between the permutation distribution of embedded time series. The resulting distance measure has linear time complexity, is invariant to phase and monotonic transformations, and robust to outliers. A probabilistic interpretation allows the determination of the number of significantly different clusters. An entropy-based heuristic relieves the user of the need to choose the parameters of the underlying time-delayed embedding manually and, thus, makes it possible to regard the approach as parameter-free. This approach is illustrated with examples on empirical data.

Highlights

  • Clustering is an unsupervised technique to partition a data set into groups of similar objects with the goal of discovering an inherent but latent structure

  • We examine a heuristic to automatically choose parameters of the required time-delayed embedding of the time series and a heuristic to determine the number of clusters in a hierarchical permutation distribution (PD) clustering

  • We introduced a dissimilarity measure between time series that is based on a complexity-based divergence of the time series

Read more

Summary

Introduction

Clustering is an unsupervised technique to partition a data set into groups of similar objects with the goal of discovering an inherent but latent structure. We use the squared Hellinger distance, a metric approximation of the Kullback-Leibler divergence (Kullback and Leibler 1951), between the distributions of signal permutations of embedded time series to obtain a dissimilarity matrix of a set of times series This serves as input for further clustering or projection. Addition and multiplication of positive constants to the time series do not change its PD, making the PD invariant to monotonic normalizations, e.g., standardization by subtracting the mean and dividing by standard deviation This property relieves the researcher from deciding to normalize as a preprocessing step which often has a serious impact on the clustering results when common metric dissimilarity measures are used, e.g., the Euclidean distance. We conclude with applications on simulated and real data, and a discussion of limitations and future work

Permutation distribution
Entropy heuristic
Heuristic to determine the number of clusters
Implementation
Clustering of autoregressive time series
A comparison to UCRTSA
Clustering EEG data
Shape complexity
Discussion
Findings
Permutation index
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.