Abstract

Consider the problem of clustering objects with temporally changing multivariate variables, for instance, in the classification of cities with several changing socioeconomic indices in geographical research. If the changing multivariate can be recorded simultaneously as a multivariate time series, in which the length of each subseries is equal and the subseries can be correlated, the problem is transformed into a multivariate time series clustering problem. The available methods consider the correlations between distinct time series but overlook the shape of each time series, which causes multivariate time series with similar correlations and opposite shapes to be clustered into the same class. To overcome this problem, this paper proposes a two-phase multivariate time series clustering algorithm that considers both correlation and shape. In Phase I, the discrete wavelet transform is applied to capture the wavelet variances and the correlation coefficients between each pair of variables to realize the initial clustering of multivariate time series, where time series with a similar correlation but opposite shape may be assigned to the same cluster. In Phase II, multivariate time series are clustered based on shape via the symbolic aggregate approximation (SAX) method. In this phase, time series with similar correlations but opposite morphologies are differentiated. The method is evaluated using multivariate time series of incoming and outgoing passenger volumes from Beijing IC card data; these volume data were collected between March 4, 2013 and March 17, 2013. Based on the silhouette coefficient, our approach outperforms two popular multivariate time series clustering methods: a wavelet-based method and the SAX method.

Highlights

  • The aggregation of objects with many time-dependent variables has been considered in research on data mining, such as the classification of cities with multiple changing socioeconomic indices, the identification of crop type from various remotely sensed image series, and the categorization of the point of interest (POI) social function based on incoming and outgoing passenger flow series

  • We propose two-phase clustering of multivariate time series based on wavelet transform and symbolic aggregate approximation (SAX) (WSAX), which has the following characteristics: 1) in Phase I, the inherent correlations between variables of multivariate time series are considered by using a wavelet transform to represent the wavelet features of the original time series, and 2) in Phase II, shape-based clustering can effectively distinguish multivariate time series with opposite morphologies

  • We propose a two-phase multivariate time series clustering algorithm, namely, WSAX, which combines the advantages of feature-based and shape-based clustering methods

Read more

Summary

INTRODUCTION

The aggregation of objects with many time-dependent variables has been considered in research on data mining, such as the classification of cities with multiple changing socioeconomic indices, the identification of crop type from various remotely sensed image series, and the categorization of the point of interest (POI) social function based on incoming and outgoing passenger flow series. To extract feature vectors that represent a raw time series, Ye et al performed a generalized principal component analysis (GPCA) [10], and Guo et al [11] and Wu and Philip [12] applied an independent component analysis (ICA) They disregarded the correlations between the variables of the multivariate time series. The wavelet correlations at each scale for the multivariate time series of every pair of variables are calculated, and the wavelet variance and correlation coefficient are concatenated into a single vector to represent the multivariate time series This approach has the advantage of constructing the time series of wavelet features while considering the correlations.

NOTATIONS AND PROBLEM Let S represent a set of multivariate time series Si as
WAVELET-BASED CLUSTERING OF MULTIVARIATE
PHASE I
PHASE II
CLUSTERING VALIDITY
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call