Abstract

Clustering of multivariate time series is a central problem in data mining with applications in many fields. Frequently, the clustering target is to identify groups of series generated by the same multivariate stochastic process. Most of the approaches to address this problem include a prior step of dimensionality reduction which may result in a loss of information or consider dissimilarity measures based on correlations and cross-correlations but ignoring the serial dependence structure. We propose a novel approach to measure dissimilarity between multivariate time series aimed at jointly capturing both cross dependence and serial dependence. Specifically, each series is characterized by a set of matrices of estimated quantile cross-spectral densities, where each matrix corresponds to a pair of quantile levels. Then the dissimilarity between every couple of series is evaluated by comparing their estimated quantile cross-spectral densities, and the pairwise dissimilarity matrix is taken as starting point to develop a partitioning around medoids algorithm. Since the quantile-based cross-spectra capture dependence in quantiles of the joint distribution, the proposed metric has a high capability to discriminate between high-level dependence structures. An extensive simulation study shows that our clustering procedure outperforms a wide range of alternative methods and exhibits robustness to noise distribution besides being computationally efficient. A real data application involving bivariate financial time series illustrates the usefulness of the proposed approach. The procedure is also applied to cluster nonstationary series from the UEA multivariate time series classification archive.

Highlights

  • Time series clustering is a central problem in data mining with applications in many fields

  • Motivated by the good behaviour of the quantile autocovariance functions (QAF) metric in univariate time series (UTS) clustering, the aim of this paper is to extend this principle to multivariate time series (MTS) clustering by introducing a metric addressing jointly both cross dependence and serial dependence

  • The poor performance of the dynamic time warping-based distances was expected since they are aimed to compare shape patterns being dominated by local comparisons

Read more

Summary

Introduction

Time series clustering is a central problem in data mining with applications in many fields. The objective is to split a large set of unlabelled time series realizations into homogeneous groups so that similar series are placed together in the same group and dissimilar series are located in different groups. This unsupervised classification process is useful to characterize different dynamic patterns without the need to analyse and model each single time series, which is computationally intensive and often far from being the real target. MTS are two-dimensional objects, which increases the computational complexity, making inefficient or even infeasible some of the clustering procedures proposed to deal with UTS. High dimensionality and complexity to assess dissimilarity make challenging the MTS clustering task

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call