Abstract
Permutation distribution clustering is a complexity-based approach to clustering time series. The dissimilarity of time series is formalized as the squared Hellinger distance between the permutation distribution of embedded time series. The resulting distance measure has linear time complexity, is invariant to phase and monotonic transformations, and robust to outliers. A probabilistic interpretation allows the determination of the number of significantly different clusters. An entropy-based heuristic relieves the user of the need to choose the parameters of the underlying time-delayed embedding manually and, thus, makes it possible to regard the approach as parameter-free. This approach is illustrated with examples on empirical data.
Highlights
Clustering is an unsupervised technique to partition a data set into groups of similar objects with the goal of discovering an inherent but latent structure
We examine a heuristic to automatically choose parameters of the required time-delayed embedding of the time series and a heuristic to determine the number of clusters in a hierarchical permutation distribution (PD) clustering
We introduced a dissimilarity measure between time series that is based on a complexity-based divergence of the time series
Summary
Clustering is an unsupervised technique to partition a data set into groups of similar objects with the goal of discovering an inherent but latent structure. We use the squared Hellinger distance, a metric approximation of the Kullback-Leibler divergence (Kullback and Leibler 1951), between the distributions of signal permutations of embedded time series to obtain a dissimilarity matrix of a set of times series This serves as input for further clustering or projection. Addition and multiplication of positive constants to the time series do not change its PD, making the PD invariant to monotonic normalizations, e.g., standardization by subtracting the mean and dividing by standard deviation This property relieves the researcher from deciding to normalize as a preprocessing step which often has a serious impact on the clustering results when common metric dissimilarity measures are used, e.g., the Euclidean distance. We conclude with applications on simulated and real data, and a discussion of limitations and future work
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.